-
Notifications
You must be signed in to change notification settings - Fork 1.6k
refactor: Enhance memory allocation security in HTTP and Sagemaker request handler #8305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances memory allocation security in HTTP and SageMaker request handlers by replacing unsafe alloca()
stack allocation with safer std::vector
dynamic allocation to prevent potential stack overflow attacks through chunked transfer encoding.
- Replaces
alloca()
withstd::vector
inEVRequestToJsonImpl
to prevent stack overflow vulnerabilities - Refactors duplicate
EVBufferToJson
implementations and standardizes request parsing with new helper functions - Adds comprehensive test coverage for requests with many chunks to validate the security improvements
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
src/http_server.h | Adds new request parsing methods and moves EVBufferToJson declaration to class |
src/http_server.cc | Implements secure memory allocation with std::vector and adds chunk count limits |
src/sagemaker_server.cc | Removes duplicate EVBufferToJson and refactors to use centralized request parsing |
src/common.h | Defines HTTP_MAX_CHUNKS constant for chunk count validation |
qa/L0_http/test.sh | Adds test execution for HTTP many chunks test |
qa/L0_http/http_request_many_chunks.py | Comprehensive test suite for HTTP endpoints with many chunks |
qa/L0_sagemaker/test.sh | Adds test execution for SageMaker many chunks test |
qa/L0_sagemaker/sagemaker_request_many_chunks.py | Test suite for SageMaker endpoints with many chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work switching from allocating on stack to on heap!
One small improvement we can make is trying to catch the exception from the vector heap allocation if we are out of memory, then we don't have to set an explicit limit on the number of http chunks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work removing the hard limit on number of http chunks!
There are three repeats on this code pattern:
int n = evbuffer_peek(input_buffer, -1, NULL, NULL, 0);
if (n > 0) {
try {
v_vec = std::vector<struct evbuffer_iovec>(n);
}
catch (const std::bad_alloc& e) {
// Handle memory allocation failure
return TRITONSERVER_ErrorNew(
TRITONSERVER_ERROR_INVALID_ARG,
(std::string("Memory allocation failed for evbuffer: ") + e.what())
.c_str());
}
catch (const std::exception& e) {
// Catch any other std exceptions
return TRITONSERVER_ErrorNew(
TRITONSERVER_ERROR_INTERNAL,
(std::string("Exception while creating evbuffer vector: ") +
e.what())
.c_str());
}
v = v_vec.data();
if (evbuffer_peek(input_buffer, -1, NULL, v, n) != n) {
return TRITONSERVER_ErrorNew(
TRITONSERVER_ERROR_INTERNAL,
"unexpected error getting input buffers");
}
}
They could be merged into one, but this can be addressed later.
Two things we should have on this PR:
- Tests showing out of memory exception is caught during request memory allocation - assert we addressed the original concern.
- A pipeline passes all tests that involve the http server - assert we are not breaking anything with this change
try { | ||
v_vec = std::vector<struct evbuffer_iovec>(n); | ||
} | ||
catch (const std::bad_alloc& e) { | ||
// Handle memory allocation failure | ||
return TRITONSERVER_ErrorNew( | ||
TRITONSERVER_ERROR_INVALID_ARG, | ||
(std::string("Memory allocation failed for evbuffer: ") + e.what()) | ||
.c_str()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests making sure we are catching the out of memory exception?
Since this is the root reason why we are doing this fix. You might want to look at setting ulimit
when launching Triton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, the root reason for this change is to prevent segfault by stack overflow (has been fixed) because stack is much smaller, e.g. a few MB. There is no existing check for large payload (e.g. 100 GB of a single request, etc.).
To test the out-of-memory response, I set the chunk number to be n = 10,000,000 (large chunk number takes extremely long time as well as local resource to transfer) so the server will allocate the metadata std::vector<struct evbuffer_iovec>(n)
which is 16 bytes * 10,000,000 ≈ 160MB
. The server itself needs ~18G to start. Because the actual payload data size can be much larger than the metadata, when I send this request to server, the actual payload will take much more space than 160MB will crash the server before even hitting std::vector<struct evbuffer_iovec>(n)
.
The ulimit -v
here is also super sensitive to the tritonserver memory usage. If there is any change to the dependency library size or change to our source code, the test will break and ulimit -v
size needs to be updated in the future. I don't think it's reasonable nor feasible.
Yeah. The two repeated code can't really reuse the helper functions (EVRequestToJson, etc.) I created. They do not contain the complete parsing logic. To clarify, the original concern is to prevent segfault by stack overflow (has been fixed) because stack is much smaller, e.g. a few MB. There is no existing check for large payload (e.g. 100 GB of a single request, etc.) and adding such check is out-of-scope. |
Synced offline: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yinggeh confirmed we cannot reproduce the heap request memory allocation OOM scenario, because other parts of Triton will segfault before the heap allocation OOM.
Since the original vulnerability is no longer reproducible after the fix and the original reproductions are added to the CI tests, we can merge this PR for having the fix in the main branch as soon as possible.
@yinggeh will follow-up on verifying all tests involving the http server are passing on the CI.
What does the PR do?
In the HTTP request handling code where
alloca()
is used to allocate memory on the stack. By sending an HTTP request with chunked transfer encoding containing a large number of chunks, an attacker can trigger a stack overflow. Usestd::vector
for a safer dynamic memory allocation.EVRequestToJsonImpl
to replacealloca
withstd::vector
.EVRequestToJson
andEVRequestToJsonAllowsEmpty
.EVBufferToJson
duplicate implementation insagemaker_server.cc
.Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
Where should the reviewer start?
Test plan:
L0_http
L0_sagemaker
32259137
Caveats:
Background
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)