Fix empty tensor shape issue in DynamicCache for torch.compile #42053
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes #42027
This PR fixes a regression where torch.cat receives incorrectly shaped empty tensors during GPT2 model tracing with torch.compile, causing compilation failures.
Background
The issue was introduced in commit dc11a3c (PR #39797) where empty cache tensors were initialized as 1D tensors with shape [0] using torch.tensor([]). When these are concatenated with 4D key/value tensors [batch_size, num_heads, seq_len, head_dim] along dim=-2, torch.compiles tracing fails with empty tensor errors.
Changes
Modified DynamicLayer.lazy_initialization()
Modified QuantizedLayer.update()
Testing
The fix ensures:
Impact