Skip to content

Cannot load Qwen-2.5-14B-Instruct on more than 2 cores #965

@vinayvarahabhotla

Description

@vinayvarahabhotla

System Info

Platform:

- Platform: Linux-6.8.0-1031-aws-x86_64-with-glibc2.35
- Python version: 3.11.13


Python packages:

- `optimum-neuron` version: 0.3.0
- `neuron-sdk` version: 2.24.0
- `optimum` version: 1.24.0
- `transformers` version: 4.51.3
- `huggingface_hub` version: 0.34.4
- `torch` version: 2.7.0+cu126
- `aws-neuronx-runtime-discovery` version: NA
- `libneuronxla` version: 2.2.4410.0+835a67fb
- `neuronx-cc` version: 2.19.8089.0+8ab9f450
- `neuronx-distributed` version: 0.13.14393+b8569585
- `neuronx-hwm` version: NA
- `torch-neuronx` version: 2.7.0.2.8.6734+ac864f72
- `torch-xla` version: 2.7.0
- `transformers-neuronx` version: NA


Neuron Driver:


WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

aws-neuronx-collectives/unknown,now 2.27.34.0-ec8cd5e8b amd64 [installed]
aws-neuronx-dkms/unknown,now 2.23.9.0 all [installed]
aws-neuronx-oci-hook/unknown,now 2.11.42.0 amd64 [installed]
aws-neuronx-runtime-lib/unknown,now 2.27.23.0-8deec4dbf amd64 [installed]
aws-neuronx-tools/unknown,now 2.25.145.0 amd64 [installed]

Who can help?

@JingyaHuang @dacorvo

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

I cannot load Qwen-2.5-14B for long context on more than 2 cores.

Main Issue : I compiled a Qwen-2.4-14B-Instruct model with the following command :
optimum-cli export neuron --model Qwen/Qwen2.5-14B-Instruct --sequence_length 16384 --batch_size 1 --num_cores 6 qwen-compiled-16k

  • The compilation happened successfully.
  • But when I am loading the model with model = NeuronModelForCausalLM.from_pretrained('qwen-compiled-16k')
  • I am getting the error RuntimeError: expected shape torch.Size([896, 5120]) for layers.0.self_attn.qkv_proj.k_proj.weight but found torch.Size([4480, 5120])

But the same model runs file when I restrict the sequence_length to 4096 and the num_cores to 2

Expected behavior

Expected Behavior : RuntimeError: expected shape torch.Size([896, 5120]) for layers.0.self_attn.qkv_proj.k_proj.weight but found torch.Size([4480, 5120])

Metadata

Metadata

Assignees

No one assigned

    Labels

    StalebugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions