-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Labels
enhancementNew feature or requestNew feature or request
Description
If I understand correctly, for more precise set-up of TP within FSDP2 framework, parallelize_module(...)
must be called explicitly with a tp_plan
(https://docs.pytorch.org/tutorials/intermediate/TP_tutorial.html#combine-tensor-parallel-with-fully-sharded-data-parallel-together), example at https://github.com/pytorch/torchtune/blob/main/recipes/full_finetune_distributed.py
But as I found, Verl/Trinity do not prepare any explicit TP-plan and do not invoke parallelize_module(...)
. Am I correct that TP is not currently supported under the FSDP/2 trainer?
Thanks!
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request