Skip to content

Failed to load santacoder model with transformer 4.51.3, it's a similar issue like #37737 #37765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
8 tasks
nv-guomingz opened this issue Apr 24, 2025 · 1 comment · Fixed by #37790
Closed
8 tasks
Labels

Comments

@nv-guomingz
Copy link
Contributor

nv-guomingz commented Apr 24, 2025

System Info

Hi, I just installed transformers 4.51.3 on linux system with nvidia gpu.

When I tried to run below code snippet

import transformers

model = transformers.AutoModelForCausalLM.from_pretrained('bigcode/santacoder', device_map='auto', trust_remote_code=True)

Meanwhile, I set below envs

export WORLD_SIZE=1
export RANK=0
export LOCAL_RANK=0
export MASTER_PORT=63563
export MASTER_ADDR=`your localhost` 

I got below error message:

[rank0]:   File "/home/guomingz/.local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/guomingz/.local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4401, in from_pretrained
[rank0]:     ) = cls._load_pretrained_model(
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/guomingz/.local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4830, in _load_pretrained_model
[rank0]:     disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model(
[rank0]:                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/guomingz/.local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 777, in _load_state_dict_into_meta_model
[rank0]:     shard_and_distribute_module(
[rank0]:   File "/home/guomingz/.local/lib/python3.12/site-packages/transformers/integrations/tensor_parallel.py", line 669, in shard_and_distribute_module
[rank0]:     param = torch.nn.Parameter(param)
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/parameter.py", line 46, in __new__
[rank0]:     return torch.Tensor._make_subclass(cls, data, requires_grad)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: RuntimeError: Only Tensors of floating point and complex dtype can require gradients
[rank0]:[W424 16:00:01.852231233 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

If I unset WORLD_SIZE, the above code snippet could run succesfully.

It looks like one similar issue like #37737

Who can help?

@Cyrilvallez

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Install transformers 4.51.3 via pip install transformers
  2. Run below code snippet
    `import transformers

model = transformers.AutoModelForCausalLM.from_pretrained('bigcode/starcoder', device_map='auto')`

Expected behavior

Load model sucessfully.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Install transformers 4.51.3 via pip install transformers
  2. Run below code snippet
import transformers
model = transformers.AutoModelForCausalLM.from_pretrained('openai-community/gpt2-medium', device_map='auto', trust_remote_code=True)

Expected behavior

Load model sucessfully.

nv-guomingz pushed a commit to nv-guomingz/transformers that referenced this issue Apr 24, 2025
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
@nv-guomingz
Copy link
Contributor Author

Hi @Cyrilvallez ,could u plz review this fixing PR? #37767

nv-guomingz pushed a commit to nv-guomingz/transformers that referenced this issue Apr 24, 2025
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
@nv-guomingz nv-guomingz changed the title Failed to load starcoder model with transformer 4.51.3, it's a similar issue like #37737 Failed to load santacoder model with transformer 4.51.3, it's a similar issue like #37737 Apr 24, 2025
nv-guomingz pushed a commit to nv-guomingz/transformers that referenced this issue Apr 24, 2025
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant