[FEATURE] Support for Tensor-Parallel

If I understand correctly, for more precise set-up of TP within FSDP2 framework, `parallelize_module(...)` must be called explicitly with a `tp_plan` (https://docs.pytorch.org/tutorials/intermediate/TP_tutorial.html#combine-tensor-parallel-with-fully-sharded-data-parallel-together), example at https://github.com/pytorch/torchtune/blob/main/recipes/full_finetune_distributed.py

But as I found, Verl/Trinity do not prepare any explicit TP-plan and do not invoke `parallelize_module(...)`. Am I correct that TP is not currently supported under the FSDP/2 trainer?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] Support for Tensor-Parallel #285

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] Support for Tensor-Parallel #285

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions