Skip to content

Conversation

@vealocia
Copy link

fallback getattr to _orig_mod for ShardedModule

@google-cla
Copy link

google-cla bot commented Jul 11, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@vlasenkoalexey
Copy link
Collaborator

Hi @vealocia could you please explain why this is necessary?

@vealocia
Copy link
Author

vealocia commented Jul 16, 2025

Yes sure.
This is a similar implementation in TorchXLA's FSDP: https://github.com/pytorch/xla/blob/36ff641d17d79013fa0a49dc29d5d7ed8fabb8df/torch_xla/distributed/fsdp/xla_fully_sharded_data_parallel.py#L824, which fallbacks all attributes to wrapped module.
There are many cases in which we wrap one module with FSDP or SPMD Shard wrapper but hope to maintain the backward compatibility of the original code. For example, in huggingface's original implementation of Qwen2 language model, every forward will try to get the decoder's attention_type attribute: https://github.com/huggingface/transformers/blob/0dc2df5ddafe3cb5824ad24e85beba13e0aa6726/src/transformers/models/qwen2/modeling_qwen2.py#L405C52-L405C65. We hope our code could maintain the same behavior whether we wrap the module or not.

@vlasenkoalexey vlasenkoalexey self-requested a review July 22, 2025 03:25
@vlasenkoalexey
Copy link
Collaborator

Please sign https://cla.developers.google.com/ before submitting and make sure that tests are passing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants