Fix overflow in CopyCpuTensor for sub-byte types#28171
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a heap overflow/corruption risk in the CPU EP tensor copy path for sub-byte packed element types (e.g., int4/uint4, int2/uint2) by ensuring copy byte counts match the packed storage size.
Changes:
- Update
CopyCpuTensorto copy usingTensor::SizeInBytes()instead ofShape().Size() * DataType()->Size(). - Refactor
IdentityOp::Computeto reuseCopyCpuTensor, removing duplicated (and previously unsafe) memcpy logic. - Add CPU regression tests covering same-type casts and direct
CopyCpuTensorcopies for packed sub-byte types (odd/even sizes and larger shapes).
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/test/providers/cpu/tensor/cast_op_test.cc | Adds regression tests for packed sub-byte same-type cast and direct CopyCpuTensor memcpy behavior. |
| onnxruntime/core/providers/cpu/tensor/utils.h | Fixes memcpy byte-count calculation by using src->SizeInBytes(). |
| onnxruntime/core/providers/cpu/tensor/identity_op.h | Replaces custom copy logic with CopyCpuTensor to avoid duplicated unsafe sizing. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/3 changed files
- Comments generated: 1
tianleiwu
left a comment
There was a problem hiding this comment.
Implementation looks good overall: moving the non-string copy path to Tensor::SizeInBytes() fixes the packed sub-byte overflow in the shared helper, and reusing CopyCpuTensor from IdentityOp closes the duplicate site there too.
I left one suggestion on the regression coverage: the new direct-copy test checks the copied payload bytes, but it does not deterministically prove that the buggy implementation stayed in bounds on non-ASan runs.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tianleiwu
left a comment
There was a problem hiding this comment.
The current head looks good. Switching the shared CPU copy path to Tensor::SizeInBytes() fixes the packed sub-byte overflow at the right abstraction boundary, and routing IdentityOp through CopyCpuTensor removes the duplicated buggy byte-count logic.
The updated regression coverage also addresses the earlier test gap: the direct-copy test now uses distinct guarded buffers, so the old overflow is caught deterministically without relying on sanitizer behavior. I did not find any remaining correctness, performance, or maintainability issue in the changed paths on this head.
|
@tianleiwu thanks for the review! The Linux TensorRT CI seems to be consistently failing. I see errors like this: The same error impacts the tests I've added as well. Is the CastOpTest suite not supposed to run on CUDA EP? (Seems like this would have been a pre-existing issue in that case, but I didn't see any issues logged.) |
Please merge latest main branch. There is a fix updated cuda driver for that. |
This cherry-picks the following commits for the release: #28171 Fix overflow in CopyCpuTensor for sub-byte types
|
Hi @adrastogi , will this PR be cherry picked to ORT1.25? |
|
@anskumar01 thanks for flagging. @devang-ml is there runway for cherry-picking this into 1.25.1? I am not sure of the schedule there. |
This cherry-picks the following commits for the release: #28171 Fix overflow in CopyCpuTensor for sub-byte types #27808 [VitisAI] pass base timestamp for vitisai profiling --------- Signed-off-by: Andrew Luo <junpengl@amd.com> Co-authored-by: andrew0917 <68576025+andrew0917@users.noreply.github.com> Co-authored-by: Andrew Luo <junpengl@amd.com>
Description
When copying tensors of sub-byte types (e.g., int4, uint4), two code paths in the CPU EP's CopyCpuTensor function use the element count as the byte count during memcpy, but those types are only 4 bits (0.5 bytes). This causes ORT to copy the incorrect number of bytes, which can cause overflow / corruption.
The fix is to replace the math with
Tensor::SizeInBytes(), which already correctly handles sub-byte packing. There is a second site inIdentityOp::Computewhich has been refactored to callCopyCpuTensordirectly, eliminating the duplicated logic.The change also adds regression tests for different variations of this scenario (I was able to use them to reproduce the original issue without the proposed updates).
Motivation and Context
This fixes a problem in the CPU EP in the scenario described above.