`torch.compile` issue when computing features on multiple GPUs (`nn.DataParallel`)

- TIA Toolbox version: develop branch
- Python version: 3.11.8
- Operating System: linux

### Description

I am computing the features using multiple GPUs on the same node using `DeepFeatureExtractor`
My code for extracting features is pretty much the same as shown in the new notebook showing the feature extraction process: https://github.com/TissueImageAnalytics/tiatoolbox/pull/887

### What I Did

`nn.DataParallel` built-in within `tiatoolbox` handles the multi-gpu computations. I pulled the changes that introduced `torch.compile` and changed from `ON_GPU` to using `device`.

I updated the argument in the DeepFeatureExtractor's `predict` method to use `device` instead of `on_gpu`.

Errors traceback is very long to paste it all. But here are some of the errors (from the single run).
```
  File "/tmp/torchinductor_qun786/vv/cvvkeueuq2m4jcjzub4hcfpkhpogtc5b2xddykdgxvsxcvnpfa2w.py", line 173, in call                                               
    buf2 = extern_kernels.convolution(buf0, buf1, stride=(14, 14), padding=(0, 0), dilation=(1, 1), transposed=False, output_padding=(0, 0), groups=1, bias=Non
e)                                                                                                                                                                                                                                                                                                                
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in 
method wrapper_CUDA__cudnn_convolution)  

...

    raise exception                                                                                                                                            
RuntimeError: Caught RuntimeError in replica 0 on device 0.  

...

RuntimeError: Triton Error [CUDA]: invalid device context
```

What I can gather is that `torch.compile` is not working well with `nn.DataParallel`.

----

Please let me know if you can reproduce the error by simply running the `DeepFeatureExtractor` feature extraction code with `rcParam["torch_compile_mode"] = "default"` on a node with at least 2 devices.

Maybe `nn.DistributedDataParallel` is a better option to use: https://pytorch.org/docs/stable/notes/cuda.html#cuda-nn-ddp-instead

https://github.com/TissueImageAnalytics/tiatoolbox/blob/5f1cecbc81e0e6953a067c159c4ac1da948ba5c9/tiatoolbox/models/models_abc.py#L42-L61

	def model_to(model: torch.nn.Module, device: str = "cpu") -> torch.nn.Module:
	"""Transfers model to specified device e.g., "cpu" or "cuda".

	Args:
	model (torch.nn.Module):
	PyTorch defined model.
	device (str):
	Transfers model to the specified device. Default is "cpu".

	Returns:
	torch.nn.Module:
	The model after being moved to specified device.

	"""
	if device != "cpu":
	# DataParallel work only for cuda
	model = torch.nn.DataParallel(model)

	device = torch.device(device)
	return model.to(device)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`torch.compile` issue when computing features on multiple GPUs (`nn.DataParallel`) #889

Description

What I Did

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

torch.compile issue when computing features on multiple GPUs (nn.DataParallel) #889

Description

Description

What I Did

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`torch.compile` issue when computing features on multiple GPUs (`nn.DataParallel`) #889