Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions vllm_ascend/torchair/models/torchair_deepseek_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -813,6 +813,8 @@ def forward(
residual = get_tp_group().all_gather(residual, 0)

attn_metadata = get_forward_context().attn_metadata
if attn_metadata is not None and isinstance(attn_metadata, dict):
attn_metadata = attn_metadata['model.layers.0.self_attn.attn']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using a hardcoded key 'model.layers.0.self_attn.attn' to access attention metadata is incorrect. This will fetch metadata for layer 0 regardless of the current layer being processed, which can lead to erroneous behavior, especially in multi-layer models. The key should be constructed dynamically using the current layer's index (self.layer_idx) to ensure the correct metadata is used.

Suggested change
attn_metadata = attn_metadata['model.layers.0.self_attn.attn']
attn_metadata = attn_metadata[f"model.layers.{self.layer_idx}.self_attn.attn"]

if attn_metadata is not None:
num_tokens = attn_metadata.num_actual_tokens
else:
Expand Down
Loading