Failed to load santacoder model with transformer 4.51.3, it's a similar issue like #37737 #37765

nv-guomingz · 2025-04-24T16:23:04Z

System Info

Hi, I just installed transformers 4.51.3 on linux system with nvidia gpu.

When I tried to run below code snippet

import transformers

model = transformers.AutoModelForCausalLM.from_pretrained('bigcode/santacoder', device_map='auto', trust_remote_code=True)

Meanwhile, I set below envs

export WORLD_SIZE=1
export RANK=0
export LOCAL_RANK=0
export MASTER_PORT=63563
export MASTER_ADDR=`your localhost`

I got below error message:

[rank0]:   File "/home/guomingz/.local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/guomingz/.local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4401, in from_pretrained
[rank0]:     ) = cls._load_pretrained_model(
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/guomingz/.local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4830, in _load_pretrained_model
[rank0]:     disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model(
[rank0]:                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/guomingz/.local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 777, in _load_state_dict_into_meta_model
[rank0]:     shard_and_distribute_module(
[rank0]:   File "/home/guomingz/.local/lib/python3.12/site-packages/transformers/integrations/tensor_parallel.py", line 669, in shard_and_distribute_module
[rank0]:     param = torch.nn.Parameter(param)
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/parameter.py", line 46, in __new__
[rank0]:     return torch.Tensor._make_subclass(cls, data, requires_grad)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: RuntimeError: Only Tensors of floating point and complex dtype can require gradients
[rank0]:[W424 16:00:01.852231233 ProcessGroupNCCL.cpp:1497] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

If I unset WORLD_SIZE, the above code snippet could run succesfully.

It looks like one similar issue like #37737

Who can help?

@Cyrilvallez

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Install transformers 4.51.3 via pip install transformers
Run below code snippet
`import transformers

model = transformers.AutoModelForCausalLM.from_pretrained('bigcode/starcoder', device_map='auto')`

Expected behavior

Load model sucessfully.

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Install transformers 4.51.3 via pip install transformers
Run below code snippet

import transformers
model = transformers.AutoModelForCausalLM.from_pretrained('openai-community/gpt2-medium', device_map='auto', trust_remote_code=True)

Expected behavior

Load model sucessfully.

The text was updated successfully, but these errors were encountered:

Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>

nv-guomingz · 2025-04-24T16:27:57Z

Hi @Cyrilvallez ,could u plz review this fixing PR? #37767

Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>

nv-guomingz added the bug label Apr 24, 2025

nv-guomingz pushed a commit to nv-guomingz/transformers that referenced this issue Apr 24, 2025

Fix starcoder model loading issue huggingface#37765

283bb63

Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>

nv-guomingz pushed a commit to nv-guomingz/transformers that referenced this issue Apr 24, 2025

Fix starcoder model loading issue huggingface#37765

e2feb69

Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>

nv-guomingz changed the title ~~Failed to load starcoder model with transformer 4.51.3, it's a similar issue like #37737~~ Failed to load santacoder model with transformer 4.51.3, it's a similar issue like #37737 Apr 24, 2025

nv-guomingz pushed a commit to nv-guomingz/transformers that referenced this issue Apr 24, 2025

Fix santacoder model loading issue huggingface#37765

24a67e2

Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>

Cyrilvallez mentioned this issue Apr 25, 2025

Fix tensor parallel with non-floating dtypes #37790

Merged

Cyrilvallez closed this as completed in #37790 Apr 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to load santacoder model with transformer 4.51.3, it's a similar issue like #37737 #37765

Failed to load santacoder model with transformer 4.51.3, it's a similar issue like #37737 #37765

nv-guomingz commented Apr 24, 2025 •

edited

Loading

nv-guomingz commented Apr 24, 2025

Failed to load santacoder model with transformer 4.51.3, it's a similar issue like #37737 #37765

Failed to load santacoder model with transformer 4.51.3, it's a similar issue like #37737 #37765

Comments

nv-guomingz commented Apr 24, 2025 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Who can help?

Information

Tasks

Reproduction

Expected behavior

nv-guomingz commented Apr 24, 2025

nv-guomingz commented Apr 24, 2025 •

edited

Loading