Skip to content

Conversation

@yashwantbezawada
Copy link
Contributor

@yashwantbezawada yashwantbezawada commented Nov 6, 2025

What does this PR do?

Fixes #42027

This PR fixes a regression where torch.cat receives incorrectly shaped empty tensors during GPT2 model tracing with torch.compile, causing compilation failures.

Background

The issue was introduced in commit dc11a3c (PR #39797) where empty cache tensors were initialized as 1D tensors with shape [0] using torch.tensor([]). When these are concatenated with 4D key/value tensors [batch_size, num_heads, seq_len, head_dim] along dim=-2, torch.compiles tracing fails with empty tensor errors.

Changes

Modified DynamicLayer.lazy_initialization()

  • Changed from: torch.tensor([], dtype=..., device=...) → shape [0] (1D)
  • Changed to: torch.zeros((batch_size, num_heads, 0, head_dim), dtype=..., device=...) → shape [batch, heads, 0, dim] (4D)

Modified QuantizedLayer.update()

  • Applied same fix when resetting cache after quantization
  • Ensures empty tensors have proper 4D shape matching key_states dimensions

Testing

The fix ensures:

  • torch.cat([empty_4d_tensor, key_states], dim=-2) works correctly
  • Compatible with torch.compile tracing
  • Maintains backward compatibility with eager mode
  • Works for both DynamicLayer and QuantizedLayer caches

Impact

  • Fixes regression from v4.52.4 to v4.57.1
  • Affects models using DynamicCache with torch.compile (GPT2, and others)
  • No breaking changes to API or behavior

@yashwantbezawada yashwantbezawada force-pushed the fix/cache-empty-tensor-compile-42027 branch 2 times, most recently from 8bf24e6 to ffd4b63 Compare November 6, 2025 02:12
Fixes huggingface#42027

This commit fixes a regression where torch.cat receives incorrectly
shaped empty tensors during model tracing with torch.compile.

The issue was introduced in commit dc11a3c where empty cache tensors
were initialized as 1D tensors with shape [0] using torch.tensor([]).
When these are concatenated with 4D key/value tensors [batch, heads, seq, dim]
along dim=-2, torch.compile's tracing fails.

Changes:
- Modified DynamicLayer.lazy_initialization() to create properly shaped
  4D empty tensors [batch, heads, 0, dim] instead of 1D [0]
- Modified QuantizedLayer.update() to reset cache with proper 4D shape
- Used torch.zeros() with explicit shape matching key_states dimensions

This ensures torch.cat operations work correctly in both eager and
compiled modes.
@yashwantbezawada yashwantbezawada force-pushed the fix/cache-empty-tensor-compile-42027 branch from ffd4b63 to 1375af8 Compare November 6, 2025 02:39
@yashwantbezawada
Copy link
Contributor Author

I see that the CI tests are failing (tests_exotic_models, tests_generate, tests_torch), while code quality checks pass. I'm unable to access the detailed CircleCI logs to understand the specific test failures.

The changes I made:

  • Changed empty tensor initialization from torch.tensor([]) (1D) to torch.zeros((batch_size, num_heads, 0, head_dim)) (4D with 0 seq_len)
  • Applied this to both DynamicLayer.lazy_initialization and QuantizedLayer.update

This approach ensures torch.cat works correctly in torch.compile mode by providing properly shaped 4D tensors.

Could someone help me understand what tests are failing and why? I'd be happy to adjust the approach if needed. I'm aware of PR #40328 which takes a more comprehensive approach to torch.compile + DynamicCache compatibility.

cc @huggingface/transformers

@Rocketknight1
Copy link
Member

This is an update to a PR from @Cyrilvallez, so I'll wait for him to approve it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Empty tensor in torch model trace for concat operation

2 participants