torch.compile
issue when computing features on multiple GPUs (nn.DataParallel
)
#889
Labels
stale
Old PRs/Issues which are inactive
Description
I am computing the features using multiple GPUs on the same node using
DeepFeatureExtractor
My code for extracting features is pretty much the same as shown in the new notebook showing the feature extraction process: #887
What I Did
nn.DataParallel
built-in withintiatoolbox
handles the multi-gpu computations. I pulled the changes that introducedtorch.compile
and changed fromON_GPU
to usingdevice
.I updated the argument in the DeepFeatureExtractor's
predict
method to usedevice
instead ofon_gpu
.Errors traceback is very long to paste it all. But here are some of the errors (from the single run).
What I can gather is that
torch.compile
is not working well withnn.DataParallel
.Please let me know if you can reproduce the error by simply running the
DeepFeatureExtractor
feature extraction code withrcParam["torch_compile_mode"] = "default"
on a node with at least 2 devices.Maybe
nn.DistributedDataParallel
is a better option to use: https://pytorch.org/docs/stable/notes/cuda.html#cuda-nn-ddp-insteadtiatoolbox/tiatoolbox/models/models_abc.py
Lines 42 to 61 in 5f1cecb
The text was updated successfully, but these errors were encountered: