Skip to content

Conversation

thevilledev
Copy link
Contributor

Problem

Parsing extremely deeply nested structures (e.g., nested arrays [[[...]]], objects {a:{a:{...}}}, or long chains of unary/binary operators) could lead to excessive recursion in the C++ parser functions, potentially exhausting the native call stack and crashing the interpreter.

Solution

A constant MAX_PARSER_DEPTH (set to 1000) is introduced in core/parser.cpp. Recursive parsing functions (parse, parseArgs, parseParams, parseBind, parseObjectRemainder, parseComprehensionSpecs, parseTerminalBracketsOrUnary, maybeParseGreedy, parseInfix) now track the current recursion depth and throw a StaticError if this limit is exceeded. This ensures the parser terminates gracefully for pathological inputs instead of crashing.

The current_depth parameter is generally incremented by 1 whenever a parsing function makes a recursive call to parse a distinct sub-expression or nested structure (e.g., calling parse from within parseObjectRemainder to handle a field's value, or calling parse from parseInfix to handle the right-hand operand). This accurately tracks the syntactic nesting depth. Calls that are part of processing the same structural level (like parseParams calling parseArgs internally) might pass the depth unchanged.

A new test case (test_suite/error.parse.deep_array_nesting.jsonnet) has been added to verify that the depth limit is correctly enforced and produces the expected static error. This test case uses deeply nested arrays, which demonstrably causes a segmentation fault on the master branch due to native stack overflow during parsing, but is now caught gracefully by the introduced limit.

I also updated all function doc strings as they were either missing or outdated.

Why a constant?

Making this limit a constant (rather than a command-line configurable option like --max-stack) was a deliberate design choice at this point:

  • MAX_PARSER_DEPTH guards the parsing stage against C++ stack overflows, while --max-stack limits the evaluation stage's Jsonnet VM call stack.
  • Exceeding a parse depth of 1000 is highly unlikely for typical, human-written Jsonnet code. It often indicates a structural issue in the code itself rather than a need for a higher limit.
  • Adding a configuration option would increase CLI complexity and require plumbing the value through the library.

That being said, this is open for discussion and feedback.

NOTE: originally reported through Google OSS VRP and public disclosure was preferred.

@johnbartholomew
Copy link
Collaborator

Looks like a good improvement! It is a little tricky determining what value MAX_PARSER_DEPTH should be set to, since the available stack memory is platform-dependent. Unfortunately, avoiding such a hard-coded limit either requires a lot of restructuring or (as you note in the PR description) complexity in plumbing a runtime config value though. I guess 1000 is a reasonable starting point.

I'll try to go through the changes in a little more detail before merging but on a quick skim through this looks good.

Copy link
Collaborator

@johnbartholomew johnbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

Just a couple of minor style comments.

* This is especially important when parsing deeply nested structures that could lead to
* excessive recursion in the parser functions.
*/
static const unsigned MAX_PARSER_DEPTH = 1000;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Move this into the anonymous namespace, to be consistent with the EMPTY_FODDER definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is done now

core/parser.cpp Outdated
* \returns The last token (the one that matched parameter end).
*/
Token parseArgs(ArgParams &args, const std::string &element_kind, bool &got_comma)
Token parseArgs(ArgParams &args, const std::string &element_kind, bool &got_comma, unsigned current_depth = 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it's better not to provide a default value here. As far as I can tell there are only two call sites and both are changed in this PR to pass a current_depth value. And with a default of 0, forgetting to pass in the depth would be an easy way to introduce a code-path that can cause stack overflow.

The same comment applies for the other functions where you're adding a current_depth parameter; best not to give a default value for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I removed the default values.

Add a recursion depth limit to prevent stack overflow vulnerabilities
when parsing deeply nested Jsonnet expressions. This addresses potential
denial-of-service attacks where malicious inputs with excessive nesting
could crash the parser.

Key changes:
- Introduce MAX_PARSER_DEPTH constant (1000) to limit parser recursion
- Add depth parameter to all parsing functions
- Check depth limit before parsing recursively nested structures
- Throw clear error message when maximum depth is exceeded
- Improve documentation for parsing functions
- Add test case to verify limit enforcement

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Addresses the following issues raised in the review process:

- Move MAX_PARSER_DEPTH to an anomymous namespace.
- Remove default value for current depth to prevent accidentally
  bypassing the depth protection.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
@johnbartholomew johnbartholomew merged commit 7d1cbf8 into google:master Jul 23, 2025
7 checks passed
stephenamar-db pushed a commit to databricks/sjsonnet that referenced this pull request Aug 28, 2025
Resolves issue #473. 

This PR implements a configurable maximum parser recursion depth limit
to prevent stack overflow from deeply
nested Jsonnet structures, following the approach from
[google/jsonnet#1230](google/jsonnet#1230)

**Changes**
- Added maxParserRecursionDepth: Int = 1000 parameter to Settings class
- Added --max-parser-recursion-depth CLI argument to allow runtime
configuration
- Updated Settings flow: CLI → Config → Settings → Interpreter →
CachedResolver → Parser

**Before:** 
- Parsing extremely deeply nested structures (e.g., [[[...1000+
levels...]]]) could cause native stack
  overflow and crash the interpreter.

**After:**  
- Parser gracefully throws java.lang.Exception: Parsing exceeded maximum
recursion depth of 1000 instead of
  crashing. 
- Users can now configure the limit: `sjsonnet
--max-parser-recursion-depth 2000 deeply_nested.jsonnet`
- Allows tuning for specific use cases while maintaining safe defaults
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants