Fix empty tensor shape issue in DynamicCache for torch.compile #42053

yashwantbezawada · 2025-11-06T02:03:25Z

What does this PR do?

Fixes #42027

This PR fixes a regression where torch.cat receives incorrectly shaped empty tensors during GPT2 model tracing with torch.compile, causing compilation failures.

Background

The issue was introduced in commit dc11a3c (PR #39797) where empty cache tensors were initialized as 1D tensors with shape [0] using torch.tensor([]). When these are concatenated with 4D key/value tensors [batch_size, num_heads, seq_len, head_dim] along dim=-2, torch.compiles tracing fails with empty tensor errors.

Changes

Modified DynamicLayer.lazy_initialization()

Changed from: torch.tensor([], dtype=..., device=...) → shape [0] (1D)
Changed to: torch.zeros((batch_size, num_heads, 0, head_dim), dtype=..., device=...) → shape [batch, heads, 0, dim] (4D)

Modified QuantizedLayer.update()

Applied same fix when resetting cache after quantization
Ensures empty tensors have proper 4D shape matching key_states dimensions

Testing

The fix ensures:

torch.cat([empty_4d_tensor, key_states], dim=-2) works correctly
Compatible with torch.compile tracing
Maintains backward compatibility with eager mode
Works for both DynamicLayer and QuantizedLayer caches

Impact

Fixes regression from v4.52.4 to v4.57.1
Affects models using DynamicCache with torch.compile (GPT2, and others)
No breaking changes to API or behavior

Fixes huggingface#42027 This commit fixes a regression where torch.cat receives incorrectly shaped empty tensors during model tracing with torch.compile. The issue was introduced in commit dc11a3c where empty cache tensors were initialized as 1D tensors with shape [0] using torch.tensor([]). When these are concatenated with 4D key/value tensors [batch, heads, seq, dim] along dim=-2, torch.compile's tracing fails. Changes: - Modified DynamicLayer.lazy_initialization() to create properly shaped 4D empty tensors [batch, heads, 0, dim] instead of 1D [0] - Modified QuantizedLayer.update() to reset cache with proper 4D shape - Used torch.zeros() with explicit shape matching key_states dimensions This ensures torch.cat operations work correctly in both eager and compiled modes.

yashwantbezawada · 2025-11-06T03:01:02Z

I see that the CI tests are failing (tests_exotic_models, tests_generate, tests_torch), while code quality checks pass. I'm unable to access the detailed CircleCI logs to understand the specific test failures.

The changes I made:

Changed empty tensor initialization from torch.tensor([]) (1D) to torch.zeros((batch_size, num_heads, 0, head_dim)) (4D with 0 seq_len)
Applied this to both DynamicLayer.lazy_initialization and QuantizedLayer.update

This approach ensures torch.cat works correctly in torch.compile mode by providing properly shaped 4D tensors.

Could someone help me understand what tests are failing and why? I'd be happy to adjust the approach if needed. I'm aware of PR #40328 which takes a more comprehensive approach to torch.compile + DynamicCache compatibility.

cc @huggingface/transformers

Rocketknight1 · 2025-11-06T14:56:54Z

This is an update to a PR from @Cyrilvallez, so I'll wait for him to approve it!

yashwantbezawada force-pushed the fix/cache-empty-tensor-compile-42027 branch 2 times, most recently from 8bf24e6 to ffd4b63 Compare November 6, 2025 02:12

yashwantbezawada force-pushed the fix/cache-empty-tensor-compile-42027 branch from ffd4b63 to 1375af8 Compare November 6, 2025 02:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix empty tensor shape issue in DynamicCache for torch.compile #42053

Fix empty tensor shape issue in DynamicCache for torch.compile #42053

Uh oh!

yashwantbezawada commented Nov 6, 2025 •

edited

Loading

Uh oh!

yashwantbezawada commented Nov 6, 2025

Uh oh!

Rocketknight1 commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix empty tensor shape issue in DynamicCache for torch.compile #42053

Are you sure you want to change the base?

Fix empty tensor shape issue in DynamicCache for torch.compile #42053

Uh oh!

Conversation

yashwantbezawada commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Background

Changes

Modified DynamicLayer.lazy_initialization()

Modified QuantizedLayer.update()

Testing

Impact

Uh oh!

yashwantbezawada commented Nov 6, 2025

Uh oh!

Rocketknight1 commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yashwantbezawada commented Nov 6, 2025 •

edited

Loading