Potential dataloader memory leak and problems with multi-gpu training. 

Hi, first of all thanks a lot for the great repo. All the models provided in the repo is very easy to use. 

I have noticed a few problems with training progress, and I wanted to bring some to your attention.

First issue, is regarding multi-gpu training. I have two GPUs with 24GB of VRAM each. I have tried this:

```
$ python -m torch.distributed.launch --nproc_per_node=2 --use_env tools/train.py --cfg configs/<CONFIG_FILE_NAME>.yaml
```
But setup_ddp() fails, suggesting int(os.environ(['LOCAL_RANK'])) has below issue:

```
TypeError: '_Environ' object is not callable
```
When I try training using a single GPU command, things do run fine, but the dataloader crahses after a few epochs. 

```
Epoch: [1/200] Iter: [4/299] LR: 0.00010241 Loss: 10.58329177:   1%|▊                                                                | 4/299 [00:18<14:23,  2.93s/it]Killed
(detection) philip@philip-Z390-UD: seg_library/tools$ /home/philip/anaconda3/envs/detection/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
```
Above issue can only be avoided when I do following:
1. Force num_workers to 0, without using mp.cpu_count (which is super slow)
2. Or make batch size very small, which also slows down training progress.

When dataloader crahses, it freezes my entire computer and I wondering if you have any idea how to fix above issue. 
 







Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential dataloader memory leak and problems with multi-gpu training. #21

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Potential dataloader memory leak and problems with multi-gpu training. #21

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions