refactor: Enhance memory allocation security in HTTP and Sagemaker request handler #8305

yinggeh · 2025-07-22T23:45:12Z

What does the PR do?

In the HTTP request handling code where alloca() is used to allocate memory on the stack. By sending an HTTP request with chunked transfer encoding containing a large number of chunks, an attacker can trigger a stack overflow. Use std::vector for a safer dynamic memory allocation.

Update EVRequestToJsonImpl to replace alloca with std::vector.
Refactor repeated code in API handlers to EVRequestToJson and EVRequestToJsonAllowsEmpty.
Remove EVBufferToJson duplicate implementation in sagemaker_server.cc.
Add unit tests to L0_http and L0_sagemaker

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

refactor

Related PRs:

Where should the reviewer start?

Test plan:

L0_http
L0_sagemaker

CI Pipeline ID:
32259137

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Copilot

Pull Request Overview

This PR enhances memory allocation security in HTTP and SageMaker request handlers by replacing unsafe alloca() stack allocation with safer std::vector dynamic allocation to prevent potential stack overflow attacks through chunked transfer encoding.

Replaces alloca() with std::vector in EVRequestToJsonImpl to prevent stack overflow vulnerabilities
Refactors duplicate EVBufferToJson implementations and standardizes request parsing with new helper functions
Adds comprehensive test coverage for requests with many chunks to validate the security improvements

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/http_server.h	Adds new request parsing methods and moves `EVBufferToJson` declaration to class
src/http_server.cc	Implements secure memory allocation with `std::vector` and adds chunk count limits
src/sagemaker_server.cc	Removes duplicate `EVBufferToJson` and refactors to use centralized request parsing
src/common.h	Defines `HTTP_MAX_CHUNKS` constant for chunk count validation
qa/L0_http/test.sh	Adds test execution for HTTP many chunks test
qa/L0_http/http_request_many_chunks.py	Comprehensive test suite for HTTP endpoints with many chunks
qa/L0_sagemaker/test.sh	Adds test execution for SageMaker many chunks test
qa/L0_sagemaker/sagemaker_request_many_chunks.py	Test suite for SageMaker endpoints with many chunks

src/sagemaker_server.cc

src/http_server.cc

src/common.h

src/http_server.cc

kthui

Nice work switching from allocating on stack to on heap!

One small improvement we can make is trying to catch the exception from the vector heap allocation if we are out of memory, then we don't have to set an explicit limit on the number of http chunks.

kthui · 2025-07-24T18:56:58Z

@rmccorm4 @GuanLuo do you have concern on performance since we are allocating on heap instead of stack?

yinggeh · 2025-07-25T18:35:21Z

@rmccorm4 @GuanLuo do you have concern on performance since we are allocating on heap instead of stack?

The HTTP handler isn't the bottleneck in most cases.

kthui

Nice work removing the hard limit on number of http chunks!

There are three repeats on this code pattern:

    int n = evbuffer_peek(input_buffer, -1, NULL, NULL, 0);
    if (n > 0) {
      try {
        v_vec = std::vector<struct evbuffer_iovec>(n);
      }
      catch (const std::bad_alloc& e) {
        // Handle memory allocation failure
        return TRITONSERVER_ErrorNew(
            TRITONSERVER_ERROR_INVALID_ARG,
            (std::string("Memory allocation failed for evbuffer: ") + e.what())
                .c_str());
      }
      catch (const std::exception& e) {
        // Catch any other std exceptions
        return TRITONSERVER_ErrorNew(
            TRITONSERVER_ERROR_INTERNAL,
            (std::string("Exception while creating evbuffer vector: ") +
             e.what())
                .c_str());
      }

      v = v_vec.data();
      if (evbuffer_peek(input_buffer, -1, NULL, v, n) != n) {
        return TRITONSERVER_ErrorNew(
            TRITONSERVER_ERROR_INTERNAL,
            "unexpected error getting input buffers");
      }
    }

They could be merged into one, but this can be addressed later.

Two things we should have on this PR:

Tests showing out of memory exception is caught during request memory allocation - assert we addressed the original concern.
A pipeline passes all tests that involve the http server - assert we are not breaking anything with this change

kthui · 2025-07-25T18:45:22Z

src/http_server.cc

+    try {
+      v_vec = std::vector<struct evbuffer_iovec>(n);
+    }
+    catch (const std::bad_alloc& e) {
+      // Handle memory allocation failure
+      return TRITONSERVER_ErrorNew(
+          TRITONSERVER_ERROR_INVALID_ARG,
+          (std::string("Memory allocation failed for evbuffer: ") + e.what())
+              .c_str());
+    }


Can you add tests making sure we are catching the out of memory exception?
Since this is the root reason why we are doing this fix. You might want to look at setting ulimit when launching Triton.

To clarify, the root reason for this change is to prevent segfault by stack overflow (has been fixed) because stack is much smaller, e.g. a few MB. There is no existing check for large payload (e.g. 100 GB of a single request, etc.).

To test the out-of-memory response, I set the chunk number to be n = 10,000,000 (large chunk number takes extremely long time as well as local resource to transfer) so the server will allocate the metadata std::vector<struct evbuffer_iovec>(n) which is 16 bytes * 10,000,000 ≈ 160MB. The server itself needs ~18G to start. Because the actual payload data size can be much larger than the metadata, when I send this request to server, the actual payload will take much more space than 160MB will crash the server before even hitting std::vector<struct evbuffer_iovec>(n).

The ulimit -v here is also super sensitive to the tritonserver memory usage. If there is any change to the dependency library size or change to our source code, the test will break and ulimit -v size needs to be updated in the future. I don't think it's reasonable nor feasible.

yinggeh · 2025-07-25T22:03:26Z

Nice work removing the hard limit on number of http chunks!

There are three repeats on this code pattern:

    int n = evbuffer_peek(input_buffer, -1, NULL, NULL, 0);
    if (n > 0) {
      try {
        v_vec = std::vector<struct evbuffer_iovec>(n);
      }
      catch (const std::bad_alloc& e) {
        // Handle memory allocation failure
        return TRITONSERVER_ErrorNew(
            TRITONSERVER_ERROR_INVALID_ARG,
            (std::string("Memory allocation failed for evbuffer: ") + e.what())
                .c_str());
      }
      catch (const std::exception& e) {
        // Catch any other std exceptions
        return TRITONSERVER_ErrorNew(
            TRITONSERVER_ERROR_INTERNAL,
            (std::string("Exception while creating evbuffer vector: ") +
             e.what())
                .c_str());
      }

      v = v_vec.data();
      if (evbuffer_peek(input_buffer, -1, NULL, v, n) != n) {
        return TRITONSERVER_ErrorNew(
            TRITONSERVER_ERROR_INTERNAL,
            "unexpected error getting input buffers");
      }
    }

They could be merged into one, but this can be addressed later.

Two things we should have on this PR:

Tests showing out of memory exception is caught during request memory allocation - assert we addressed the original concern.
A pipeline passes all tests that involve the http server - assert we are not breaking anything with this change

Yeah. The two repeated code can't really reuse the helper functions (EVRequestToJson, etc.) I created. They do not contain the complete parsing logic.

To clarify, the original concern is to prevent segfault by stack overflow (has been fixed) because stack is much smaller, e.g. a few MB. There is no existing check for large payload (e.g. 100 GB of a single request, etc.) and adding such check is out-of-scope.

kthui · 2025-07-25T22:26:15Z

Synced offline:
The size of the program can be better predicted by starting Triton with no model and turning off any non-http server related features. Then, send a http request to the server and see if that can trigger an OOM.

kthui

@yinggeh confirmed we cannot reproduce the heap request memory allocation OOM scenario, because other parts of Triton will segfault before the heap allocation OOM.

Since the original vulnerability is no longer reproducible after the fix and the original reproductions are added to the CI tests, we can merge this PR for having the fix in the main branch as soon as possible.

@yinggeh will follow-up on verifying all tests involving the http server are passing on the CI.

…quest handler (#8305)

…quest handler (#8305) (#8314) Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>

yinggeh added 4 commits July 21, 2025 03:14

Initial impl

0928ceb

Update error message

e01f162

Add unittests

7d81c7f

Add sagemaker test

727ccea

yinggeh requested review from pskiran1 and mattwittwer July 22, 2025 23:45

yinggeh self-assigned this Jul 22, 2025

yinggeh added the PR: refactor A code change that neither fixes a bug nor adds a feature label Jul 22, 2025

yinggeh changed the title ~~refactor: Replace alloca() with safer dynamic memory allocation~~ refactor: Enhance memory allocation in HTTP and Sagemaker request handler Jul 22, 2025

yinggeh requested a review from rmccorm4 July 23, 2025 00:07

fix model repo dir

b7be4a9

mc-nv added cherry-pick:25.07 DLIS-8401 TPRD-1628 DLIS-8402 and removed cherry-pick:25.07 DLIS-8401 DLIS-8402 labels Jul 23, 2025

yinggeh changed the title ~~refactor: Enhance memory allocation in HTTP and Sagemaker request handler~~ refactor: Enhance memory allocation security in HTTP and Sagemaker request handler Jul 23, 2025

rmccorm4 requested a review from kthui July 23, 2025 21:03

Minor update

34693af

kthui requested a review from Copilot July 24, 2025 17:26

Copilot AI reviewed Jul 24, 2025

View reviewed changes

src/sagemaker_server.cc Show resolved Hide resolved

src/http_server.cc Show resolved Hide resolved

src/common.h Outdated Show resolved Hide resolved

fix sagemaker model dir

982dd3b

kthui reviewed Jul 24, 2025

View reviewed changes

src/http_server.cc Outdated Show resolved Hide resolved

kthui reviewed Jul 24, 2025

View reviewed changes

kthui requested a review from GuanLuo July 24, 2025 19:06

yinggeh added 2 commits July 24, 2025 12:38

Remove HTTP_MAX_CHUNKS

411ac70

Minor improvement

e51e97e

yinggeh requested a review from kthui July 24, 2025 22:05

kthui reviewed Jul 25, 2025

View reviewed changes

kthui approved these changes Jul 25, 2025

View reviewed changes

yinggeh merged commit ad72741 into main Jul 25, 2025
3 checks passed

yinggeh deleted the yinggeh-DLIS-8402-replace-alloca branch July 25, 2025 23:37

mc-nv pushed a commit that referenced this pull request Jul 25, 2025

refactor: Enhance memory allocation security in HTTP and Sagemaker re…

fd073a0

…quest handler (#8305)

mc-nv added a commit that referenced this pull request Jul 25, 2025

refactor: Enhance memory allocation security in HTTP and Sagemaker re…

bbb3795

…quest handler (#8305) (#8314) Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: Enhance memory allocation security in HTTP and Sagemaker request handler #8305

refactor: Enhance memory allocation security in HTTP and Sagemaker request handler #8305

yinggeh commented Jul 22, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kthui left a comment

Uh oh!

kthui commented Jul 24, 2025

Uh oh!

yinggeh commented Jul 25, 2025

Uh oh!

kthui left a comment

Uh oh!

kthui Jul 25, 2025

Uh oh!

yinggeh Jul 25, 2025 •

edited

Loading

Uh oh!

yinggeh commented Jul 25, 2025 •

edited

Loading

Uh oh!

kthui commented Jul 25, 2025

Uh oh!

kthui left a comment

Uh oh!

Uh oh!

Uh oh!

refactor: Enhance memory allocation security in HTTP and Sagemaker request handler #8305

refactor: Enhance memory allocation security in HTTP and Sagemaker request handler #8305

Conversation

yinggeh commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kthui left a comment

Choose a reason for hiding this comment

Uh oh!

kthui commented Jul 24, 2025

Uh oh!

yinggeh commented Jul 25, 2025

Uh oh!

kthui left a comment

Choose a reason for hiding this comment

Uh oh!

kthui Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

yinggeh Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yinggeh commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kthui commented Jul 25, 2025

Uh oh!

kthui left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yinggeh commented Jul 22, 2025 •

edited

Loading

yinggeh Jul 25, 2025 •

edited

Loading

yinggeh commented Jul 25, 2025 •

edited

Loading