fix: HTML tag regex pattern to prevent infinite loop on setMessage #712
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Background
GPT5 Analysis (Regex is outside of my balliwick sorry)
<< 'EOF'
, and also many pseudo-XML tags like<thought>
,<analysis>
, etc.tagRegex
is a naïve HTML matcher with nested alternation and a repeating group:(< (?: [^<>"']+ | "..." | '...' )* >)
<
that are not real tags (e.g.,<< 'EOF'
) or when there are many<...>
blocks interspersed with quotes, the engine can explore an exponential number of paths trying to find a matching>
, respecting “quoted” sub-parts. That’s a classic catastrophic backtracking pattern.pathRegex
uses a positive lookbehind with alternatives of different widths:(?<=^|[> \
'"\n]|'|")`. Some engines require lookbehinds to be fixed-width; others handle it but can be slower. This also gets re-created on every call.Recommended patterns
[Strict-ish HTML tag with attributes]
<= I chose this one
- Matches tags that start with a letter (optionally preceded by
/
), with reasonable attribute/name shapes and quoted values.- Skips heredoc
<<
and other non-tag<
.- Good balance of coverage and speed.
[Simpler, still safe “tag-ish”]
- Requires
<
then optional/
then a letter; consumes to>
while allowing quoted strings.- Even simpler, still avoids matching
<< 'EOF'
.Next steps
Error causing message