-
Notifications
You must be signed in to change notification settings - Fork 466
feat: introduce maximum parser recursion depth #1230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Looks like a good improvement! It is a little tricky determining what value I'll try to go through the changes in a little more detail before merging but on a quick skim through this looks good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
Just a couple of minor style comments.
* This is especially important when parsing deeply nested structures that could lead to | ||
* excessive recursion in the parser functions. | ||
*/ | ||
static const unsigned MAX_PARSER_DEPTH = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Move this into the anonymous namespace, to be consistent with the EMPTY_FODDER
definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is done now
core/parser.cpp
Outdated
* \returns The last token (the one that matched parameter end). | ||
*/ | ||
Token parseArgs(ArgParams &args, const std::string &element_kind, bool &got_comma) | ||
Token parseArgs(ArgParams &args, const std::string &element_kind, bool &got_comma, unsigned current_depth = 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO it's better not to provide a default value here. As far as I can tell there are only two call sites and both are changed in this PR to pass a current_depth value. And with a default of 0, forgetting to pass in the depth would be an easy way to introduce a code-path that can cause stack overflow.
The same comment applies for the other functions where you're adding a current_depth
parameter; best not to give a default value for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I removed the default values.
Add a recursion depth limit to prevent stack overflow vulnerabilities when parsing deeply nested Jsonnet expressions. This addresses potential denial-of-service attacks where malicious inputs with excessive nesting could crash the parser. Key changes: - Introduce MAX_PARSER_DEPTH constant (1000) to limit parser recursion - Add depth parameter to all parsing functions - Check depth limit before parsing recursively nested structures - Throw clear error message when maximum depth is exceeded - Improve documentation for parsing functions - Add test case to verify limit enforcement Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Addresses the following issues raised in the review process: - Move MAX_PARSER_DEPTH to an anomymous namespace. - Remove default value for current depth to prevent accidentally bypassing the depth protection. Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
653dab3
to
7d1cbf8
Compare
Resolves issue #473. This PR implements a configurable maximum parser recursion depth limit to prevent stack overflow from deeply nested Jsonnet structures, following the approach from [google/jsonnet#1230](google/jsonnet#1230) **Changes** - Added maxParserRecursionDepth: Int = 1000 parameter to Settings class - Added --max-parser-recursion-depth CLI argument to allow runtime configuration - Updated Settings flow: CLI → Config → Settings → Interpreter → CachedResolver → Parser **Before:** - Parsing extremely deeply nested structures (e.g., [[[...1000+ levels...]]]) could cause native stack overflow and crash the interpreter. **After:** - Parser gracefully throws java.lang.Exception: Parsing exceeded maximum recursion depth of 1000 instead of crashing. - Users can now configure the limit: `sjsonnet --max-parser-recursion-depth 2000 deeply_nested.jsonnet` - Allows tuning for specific use cases while maintaining safe defaults
Problem
Parsing extremely deeply nested structures (e.g., nested arrays [[[...]]], objects {a:{a:{...}}}, or long chains of unary/binary operators) could lead to excessive recursion in the C++ parser functions, potentially exhausting the native call stack and crashing the interpreter.
Solution
A constant
MAX_PARSER_DEPTH
(set to 1000) is introduced incore/parser.cpp
. Recursive parsing functions (parse
,parseArgs
,parseParams
,parseBind
,parseObjectRemainder
,parseComprehensionSpecs
,parseTerminalBracketsOrUnary
,maybeParseGreedy
,parseInfix
) now track the current recursion depth and throw aStaticError
if this limit is exceeded. This ensures the parser terminates gracefully for pathological inputs instead of crashing.The
current_depth
parameter is generally incremented by 1 whenever a parsing function makes a recursive call to parse a distinct sub-expression or nested structure (e.g., callingparse
from withinparseObjectRemainder
to handle a field's value, or callingparse
fromparseInfix
to handle the right-hand operand). This accurately tracks the syntactic nesting depth. Calls that are part of processing the same structural level (likeparseParams
callingparseArgs
internally) might pass the depth unchanged.A new test case (
test_suite/error.parse.deep_array_nesting.jsonnet
) has been added to verify that the depth limit is correctly enforced and produces the expected static error. This test case uses deeply nested arrays, which demonstrably causes a segmentation fault on the master branch due to native stack overflow during parsing, but is now caught gracefully by the introduced limit.I also updated all function doc strings as they were either missing or outdated.
Why a constant?
Making this limit a constant (rather than a command-line configurable option like
--max-stack
) was a deliberate design choice at this point:MAX_PARSER_DEPTH
guards the parsing stage against C++ stack overflows, while--max-stack
limits the evaluation stage's Jsonnet VM call stack.That being said, this is open for discussion and feedback.
NOTE: originally reported through Google OSS VRP and public disclosure was preferred.