Fix overflow in CopyCpuTensor for sub-byte types by adrastogi · Pull Request #28171 · microsoft/onnxruntime

adrastogi · 2026-04-22T06:01:42Z

Description

When copying tensors of sub-byte types (e.g., int4, uint4), two code paths in the CPU EP's CopyCpuTensor function use the element count as the byte count during memcpy, but those types are only 4 bits (0.5 bytes). This causes ORT to copy the incorrect number of bytes, which can cause overflow / corruption.

The fix is to replace the math with Tensor::SizeInBytes(), which already correctly handles sub-byte packing. There is a second site in IdentityOp::Compute which has been refactored to call CopyCpuTensor directly, eliminating the duplicated logic.

The change also adds regression tests for different variations of this scenario (I was able to use them to reproduce the original issue without the proposed updates).

Motivation and Context

This fixes a problem in the CPU EP in the scenario described above.

Copilot

Pull request overview

This PR fixes a heap overflow/corruption risk in the CPU EP tensor copy path for sub-byte packed element types (e.g., int4/uint4, int2/uint2) by ensuring copy byte counts match the packed storage size.

Changes:

Update CopyCpuTensor to copy using Tensor::SizeInBytes() instead of Shape().Size() * DataType()->Size().
Refactor IdentityOp::Compute to reuse CopyCpuTensor, removing duplicated (and previously unsafe) memcpy logic.
Add CPU regression tests covering same-type casts and direct CopyCpuTensor copies for packed sub-byte types (odd/even sizes and larger shapes).

Show a summary per file

File	Description
onnxruntime/test/providers/cpu/tensor/cast_op_test.cc	Adds regression tests for packed sub-byte same-type cast and direct `CopyCpuTensor` memcpy behavior.
onnxruntime/core/providers/cpu/tensor/utils.h	Fixes memcpy byte-count calculation by using `src->SizeInBytes()`.
onnxruntime/core/providers/cpu/tensor/identity_op.h	Replaces custom copy logic with `CopyCpuTensor` to avoid duplicated unsafe sizing.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 3/3 changed files
Comments generated: 1

github-actions

You can commit the suggested changes from lintrunner.

tianleiwu

Implementation looks good overall: moving the non-string copy path to Tensor::SizeInBytes() fixes the packed sub-byte overflow in the shared helper, and reusing CopyCpuTensor from IdentityOp closes the duplicate site there too.

I left one suggestion on the regression coverage: the new direct-copy test checks the copied payload bytes, but it does not deterministically prove that the buggy implementation stayed in bounds on non-ASan runs.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tianleiwu

The current head looks good. Switching the shared CPU copy path to Tensor::SizeInBytes() fixes the packed sub-byte overflow at the right abstraction boundary, and routing IdentityOp through CopyCpuTensor removes the duplicated buggy byte-count logic.

The updated regression coverage also addresses the earlier test gap: the direct-copy test now uses distinct guarded buffers, so the old overflow is caught deterministically without relying on sanitizer behavior. I did not find any remaining correctness, performance, or maintainability issue in the changed paths on this head.

adrastogi · 2026-04-23T07:08:11Z

@tianleiwu thanks for the review! The Linux TensorRT CI seems to be consistently failing. I see errors like this:

2026-04-23T03:10:34.9340117Z 2: [----------] 153 tests from CastOpTest
2026-04-23T03:10:34.9340224Z 2: [ RUN      ] CastOpTest.NonStringTypes
2026-04-23T03:10:34.9340289Z 2: unknown file: Failure
2026-04-23T03:10:34.9342955Z 2: C++ exception with description "/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:122 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; SUCCTYPE = cudaError; std::conditional_t<THRW, void, common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; SUCCTYPE = cudaError; std::conditional_t<THRW, void, common::Status> = void] CUDA failure 100: no CUDA-capable device is detected ; GPU=-1 ; hostname=7f6d4e25f76b ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=336 ; expr=cudaSetDevice(info_.device_id); 
2026-04-23T03:10:34.9343019Z 2: 
2026-04-23T03:10:34.9343088Z 2: " thrown in the test body.
2026-04-23T03:10:34.9343140Z 2: 
2026-04-23T03:10:34.9343238Z 2: [  FAILED  ] CastOpTest.NonStringTypes (0 ms)
2026-04-23T03:10:34.9343313Z 2: [ RUN      ] CastOpTest.FromString
2026-04-23T03:10:34.9343378Z 2: unknown file: Failure
2026-04-23T03:10:34.9345860Z 2: C++ exception with description "/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:122 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; SUCCTYPE = cudaError; std::conditional_t<THRW, void, common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; SUCCTYPE = cudaError; std::conditional_t<THRW, void, common::Status> = void] CUDA failure 100: no CUDA-capable device is detected ; GPU=-1 ; hostname=7f6d4e25f76b ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=336 ; expr=cudaSetDevice(info_.device_id); 
2026-04-23T03:10:34.9346159Z 2: 
2026-04-23T03:10:34.9346228Z 2: " thrown in the test body.
2026-04-23T03:10:34.9346276Z 2: 
2026-04-23T03:10:34.9346359Z 2: [  FAILED  ] CastOpTest.FromString (0 ms)

The same error impacts the tests I've added as well. Is the CastOpTest suite not supposed to run on CUDA EP? (Seems like this would have been a pre-existing issue in that case, but I didn't see any issues logged.)

tianleiwu · 2026-04-23T08:35:23Z

@tianleiwu thanks for the review! The Linux TensorRT CI seems to be consistently failing. I see errors like this:

Please merge latest main branch. There is a fix updated cuda driver for that.

fdwr

👍

This cherry-picks the following commits for the release: #28171 Fix overflow in CopyCpuTensor for sub-byte types

anskumar01 · 2026-04-29T18:37:35Z

Hi @adrastogi , will this PR be cherry picked to ORT1.25?

adrastogi · 2026-04-30T05:13:07Z

@anskumar01 thanks for flagging. @devang-ml is there runway for cherry-picking this into 1.25.1? I am not sure of the schedule there.

This cherry-picks the following commits for the release: #28171 Fix overflow in CopyCpuTensor for sub-byte types #27808 [VitisAI] pass base timestamp for vitisai profiling --------- Signed-off-by: Andrew Luo <junpengl@amd.com> Co-authored-by: andrew0917 <68576025+andrew0917@users.noreply.github.com> Co-authored-by: Andrew Luo <junpengl@amd.com>

Candidate fix

959b671

adrastogi requested a review from Copilot April 22, 2026 06:01

Copilot started reviewing on behalf of adrastogi April 22, 2026 06:02 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread onnxruntime/test/providers/cpu/tensor/cast_op_test.cc Outdated

github-actions Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread onnxruntime/test/providers/cpu/tensor/cast_op_test.cc Outdated

Comment thread onnxruntime/test/providers/cpu/tensor/cast_op_test.cc Outdated

PR feedback (fix comment)

ece6a83

tianleiwu requested a review from Copilot April 22, 2026 07:50

Copilot started reviewing on behalf of tianleiwu April 22, 2026 07:51 View session

tianleiwu reviewed Apr 22, 2026

View reviewed changes

Comment thread onnxruntime/test/providers/cpu/tensor/cast_op_test.cc

Copilot AI reviewed Apr 22, 2026

View reviewed changes

PR feedback (stronger validation approach)

98ae887

adrastogi requested a review from tianleiwu April 23, 2026 02:57

tianleiwu approved these changes Apr 23, 2026

View reviewed changes

Merge branch 'main' into adrastogi/cpu-fix

a8853a4

tianleiwu merged commit 4265122 into main Apr 23, 2026
88 of 89 checks passed

tianleiwu deleted the adrastogi/cpu-fix branch April 23, 2026 15:55

fdwr reviewed Apr 23, 2026

View reviewed changes

adrastogi mentioned this pull request Apr 27, 2026

ORT 1.24.6 Cherry Picks #28246

Merged

adrastogi added a commit that referenced this pull request Apr 29, 2026

ORT 1.24.6 Cherry Picks (#28246)

800ac32

This cherry-picks the following commits for the release: #28171 Fix overflow in CopyCpuTensor for sub-byte types

adrastogi mentioned this pull request May 4, 2026

ORT 1.25.2 Cherry Picks, Round 1 #28338

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix overflow in CopyCpuTensor for sub-byte types#28171

Fix overflow in CopyCpuTensor for sub-byte types#28171
tianleiwu merged 4 commits into
mainfrom
adrastogi/cpu-fix

adrastogi commented Apr 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

tianleiwu left a comment

Uh oh!

adrastogi commented Apr 23, 2026 •

edited

Loading

Uh oh!

tianleiwu commented Apr 23, 2026

Uh oh!

Uh oh!

fdwr left a comment

Uh oh!

anskumar01 commented Apr 29, 2026

Uh oh!

adrastogi commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

adrastogi commented Apr 22, 2026

Description

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

adrastogi commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianleiwu commented Apr 23, 2026

Uh oh!

Uh oh!

fdwr left a comment

Choose a reason for hiding this comment

Uh oh!

anskumar01 commented Apr 29, 2026

Uh oh!

adrastogi commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

adrastogi commented Apr 23, 2026 •

edited

Loading