DistributedModelParallel sets "requires_grad" to False for EmbeddingCollection with sharding_types=["row_wise"], compute_kernels=["fused"],

I’m experiencing an issue while managing a large-scale embedding vocabulary using the EmbeddingCollection feature. Specifically, when I configure the EmbeddingShardingPlanner with the “row_wise” sharding type and fused compute kernels, the resulting DistributedModelParallel instance sets the requires_grad attribute of the embedding parameters to False. This prevents the gradients from being updated during training.

<img width="1226" height="1688" alt="Image" src="https://github.com/user-attachments/assets/715caf04-4241-4ba2-abb1-d90e1d575827" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DistributedModelParallel sets "requires_grad" to False for EmbeddingCollection with sharding_types=["row_wise"], compute_kernels=["fused"], #3271

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DistributedModelParallel sets "requires_grad" to False for EmbeddingCollection with sharding_types=["row_wise"], compute_kernels=["fused"], #3271

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions