Skip to content

refactor: Enhance memory allocation security in HTTP and Sagemaker request handler #8305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 25, 2025

Conversation

yinggeh
Copy link
Contributor

@yinggeh yinggeh commented Jul 22, 2025

What does the PR do?

In the HTTP request handling code where alloca() is used to allocate memory on the stack. By sending an HTTP request with chunked transfer encoding containing a large number of chunks, an attacker can trigger a stack overflow. Use std::vector for a safer dynamic memory allocation.

  1. Update EVRequestToJsonImpl to replace alloca with std::vector.
  2. Refactor repeated code in API handlers to EVRequestToJson and EVRequestToJsonAllowsEmpty.
  3. Remove EVBufferToJson duplicate implementation in sagemaker_server.cc.
  4. Add unit tests to L0_http and L0_sagemaker

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • refactor

Related PRs:

Where should the reviewer start?

Test plan:

L0_http
L0_sagemaker

  • CI Pipeline ID:
    32259137

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@yinggeh yinggeh requested review from pskiran1 and mattwittwer July 22, 2025 23:45
@yinggeh yinggeh self-assigned this Jul 22, 2025
@yinggeh yinggeh added the PR: refactor A code change that neither fixes a bug nor adds a feature label Jul 22, 2025
@yinggeh yinggeh changed the title refactor: Replace alloca() with safer dynamic memory allocation refactor: Enhance memory allocation in HTTP and Sagemaker request handler Jul 22, 2025
@yinggeh yinggeh requested a review from rmccorm4 July 23, 2025 00:07
@yinggeh yinggeh changed the title refactor: Enhance memory allocation in HTTP and Sagemaker request handler refactor: Enhance memory allocation security in HTTP and Sagemaker request handler Jul 23, 2025
@rmccorm4 rmccorm4 requested a review from kthui July 23, 2025 21:03
@kthui kthui requested a review from Copilot July 24, 2025 17:26
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances memory allocation security in HTTP and SageMaker request handlers by replacing unsafe alloca() stack allocation with safer std::vector dynamic allocation to prevent potential stack overflow attacks through chunked transfer encoding.

  • Replaces alloca() with std::vector in EVRequestToJsonImpl to prevent stack overflow vulnerabilities
  • Refactors duplicate EVBufferToJson implementations and standardizes request parsing with new helper functions
  • Adds comprehensive test coverage for requests with many chunks to validate the security improvements

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/http_server.h Adds new request parsing methods and moves EVBufferToJson declaration to class
src/http_server.cc Implements secure memory allocation with std::vector and adds chunk count limits
src/sagemaker_server.cc Removes duplicate EVBufferToJson and refactors to use centralized request parsing
src/common.h Defines HTTP_MAX_CHUNKS constant for chunk count validation
qa/L0_http/test.sh Adds test execution for HTTP many chunks test
qa/L0_http/http_request_many_chunks.py Comprehensive test suite for HTTP endpoints with many chunks
qa/L0_sagemaker/test.sh Adds test execution for SageMaker many chunks test
qa/L0_sagemaker/sagemaker_request_many_chunks.py Test suite for SageMaker endpoints with many chunks

Copy link
Contributor

@kthui kthui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work switching from allocating on stack to on heap!

One small improvement we can make is trying to catch the exception from the vector heap allocation if we are out of memory, then we don't have to set an explicit limit on the number of http chunks.

@kthui
Copy link
Contributor

kthui commented Jul 24, 2025

@rmccorm4 @GuanLuo do you have concern on performance since we are allocating on heap instead of stack?

@kthui kthui requested a review from GuanLuo July 24, 2025 19:06
@yinggeh yinggeh requested a review from kthui July 24, 2025 22:05
@yinggeh
Copy link
Contributor Author

yinggeh commented Jul 25, 2025

@rmccorm4 @GuanLuo do you have concern on performance since we are allocating on heap instead of stack?

The HTTP handler isn't the bottleneck in most cases.

Copy link
Contributor

@kthui kthui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work removing the hard limit on number of http chunks!

There are three repeats on this code pattern:

    int n = evbuffer_peek(input_buffer, -1, NULL, NULL, 0);
    if (n > 0) {
      try {
        v_vec = std::vector<struct evbuffer_iovec>(n);
      }
      catch (const std::bad_alloc& e) {
        // Handle memory allocation failure
        return TRITONSERVER_ErrorNew(
            TRITONSERVER_ERROR_INVALID_ARG,
            (std::string("Memory allocation failed for evbuffer: ") + e.what())
                .c_str());
      }
      catch (const std::exception& e) {
        // Catch any other std exceptions
        return TRITONSERVER_ErrorNew(
            TRITONSERVER_ERROR_INTERNAL,
            (std::string("Exception while creating evbuffer vector: ") +
             e.what())
                .c_str());
      }

      v = v_vec.data();
      if (evbuffer_peek(input_buffer, -1, NULL, v, n) != n) {
        return TRITONSERVER_ErrorNew(
            TRITONSERVER_ERROR_INTERNAL,
            "unexpected error getting input buffers");
      }
    }

They could be merged into one, but this can be addressed later.

Two things we should have on this PR:

  1. Tests showing out of memory exception is caught during request memory allocation - assert we addressed the original concern.
  2. A pipeline passes all tests that involve the http server - assert we are not breaking anything with this change

Comment on lines +2879 to +2888
try {
v_vec = std::vector<struct evbuffer_iovec>(n);
}
catch (const std::bad_alloc& e) {
// Handle memory allocation failure
return TRITONSERVER_ErrorNew(
TRITONSERVER_ERROR_INVALID_ARG,
(std::string("Memory allocation failed for evbuffer: ") + e.what())
.c_str());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests making sure we are catching the out of memory exception?
Since this is the root reason why we are doing this fix. You might want to look at setting ulimit when launching Triton.

Copy link
Contributor Author

@yinggeh yinggeh Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the root reason for this change is to prevent segfault by stack overflow (has been fixed) because stack is much smaller, e.g. a few MB. There is no existing check for large payload (e.g. 100 GB of a single request, etc.).

To test the out-of-memory response, I set the chunk number to be n = 10,000,000 (large chunk number takes extremely long time as well as local resource to transfer) so the server will allocate the metadata std::vector<struct evbuffer_iovec>(n) which is 16 bytes * 10,000,000 ≈ 160MB. The server itself needs ~18G to start. Because the actual payload data size can be much larger than the metadata, when I send this request to server, the actual payload will take much more space than 160MB will crash the server before even hitting std::vector<struct evbuffer_iovec>(n).

The ulimit -v here is also super sensitive to the tritonserver memory usage. If there is any change to the dependency library size or change to our source code, the test will break and ulimit -v size needs to be updated in the future. I don't think it's reasonable nor feasible.

@yinggeh
Copy link
Contributor Author

yinggeh commented Jul 25, 2025

Nice work removing the hard limit on number of http chunks!

There are three repeats on this code pattern:

    int n = evbuffer_peek(input_buffer, -1, NULL, NULL, 0);
    if (n > 0) {
      try {
        v_vec = std::vector<struct evbuffer_iovec>(n);
      }
      catch (const std::bad_alloc& e) {
        // Handle memory allocation failure
        return TRITONSERVER_ErrorNew(
            TRITONSERVER_ERROR_INVALID_ARG,
            (std::string("Memory allocation failed for evbuffer: ") + e.what())
                .c_str());
      }
      catch (const std::exception& e) {
        // Catch any other std exceptions
        return TRITONSERVER_ErrorNew(
            TRITONSERVER_ERROR_INTERNAL,
            (std::string("Exception while creating evbuffer vector: ") +
             e.what())
                .c_str());
      }

      v = v_vec.data();
      if (evbuffer_peek(input_buffer, -1, NULL, v, n) != n) {
        return TRITONSERVER_ErrorNew(
            TRITONSERVER_ERROR_INTERNAL,
            "unexpected error getting input buffers");
      }
    }

They could be merged into one, but this can be addressed later.

Two things we should have on this PR:

  1. Tests showing out of memory exception is caught during request memory allocation - assert we addressed the original concern.
  2. A pipeline passes all tests that involve the http server - assert we are not breaking anything with this change

Yeah. The two repeated code can't really reuse the helper functions (EVRequestToJson, etc.) I created. They do not contain the complete parsing logic.

To clarify, the original concern is to prevent segfault by stack overflow (has been fixed) because stack is much smaller, e.g. a few MB. There is no existing check for large payload (e.g. 100 GB of a single request, etc.) and adding such check is out-of-scope.

@kthui
Copy link
Contributor

kthui commented Jul 25, 2025

Synced offline:
The size of the program can be better predicted by starting Triton with no model and turning off any non-http server related features. Then, send a http request to the server and see if that can trigger an OOM.

Copy link
Contributor

@kthui kthui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yinggeh confirmed we cannot reproduce the heap request memory allocation OOM scenario, because other parts of Triton will segfault before the heap allocation OOM.

Since the original vulnerability is no longer reproducible after the fix and the original reproductions are added to the CI tests, we can merge this PR for having the fix in the main branch as soon as possible.

@yinggeh will follow-up on verifying all tests involving the http server are passing on the CI.

@yinggeh yinggeh merged commit ad72741 into main Jul 25, 2025
3 checks passed
@yinggeh yinggeh deleted the yinggeh-DLIS-8402-replace-alloca branch July 25, 2025 23:37
mc-nv pushed a commit that referenced this pull request Jul 25, 2025
mc-nv added a commit that referenced this pull request Jul 25, 2025
…quest handler (#8305) (#8314)

Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DLIS-8401 DLIS-8402 PR: refactor A code change that neither fixes a bug nor adds a feature TPRD-1628
Development

Successfully merging this pull request may close these issues.

3 participants