Skip to content

Conversation

upwindgso
Copy link

Summary

  • Change to webui/js/message.js:944 tegRegex pattern to improve robustness to < characters from not real tags / unmatched tags

Background

  • After letting my agents get well into file creation (and trying to reflect on their incorrect syntax) the agents eventually generated a response that, on parsing by webui/js/message.js:946 causes the UI to enter an endless loop
  • My regex is even weaker than my js but the amended Windsurf generated pattern seemed to handle the payload + the entirety of my other chat histories without apparent rendering side effects

GPT5 Analysis (Regex is outside of my balliwick sorry)

  • The input includes non-HTML “double less-than” heredoc marker: << 'EOF', and also many pseudo-XML tags like <thought>, <analysis>, etc.
  • The splitting regex tagRegex is a naïve HTML matcher with nested alternation and a repeating group:
    • (< (?: [^<>"']+ | "..." | '...' )* >)
    • When it meets sequences starting with < that are not real tags (e.g., << 'EOF') or when there are many <...> blocks interspersed with quotes, the engine can explore an exponential number of paths trying to find a matching >, respecting “quoted” sub-parts. That’s a classic catastrophic backtracking pattern.
    • This can spike CPU and feel like a freeze for certain inputs. Your heredoc example is exactly the kind of text that triggers it.
  • Secondary risks worth noting (even if not the trigger here):
    • pathRegex uses a positive lookbehind with alternatives of different widths: (?<=^|[> \'"\n]|'|")`. Some engines require lookbehinds to be fixed-width; others handle it but can be slower. This also gets re-created on every call.

Recommended patterns

[Strict-ish HTML tag with attributes] <= I chose this one
- Matches tags that start with a letter (optionally preceded by /), with reasonable attribute/name shapes and quoted values.
- Skips heredoc << and other non-tag <.
- Good balance of coverage and speed.

const tagRegex = /(<\/?(?:[A-Za-z][A-Za-z0-9:-]*)(?:\s+[A-Za-z_:][A-Za-z0-9:._-]*(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s'"=<>`]+))?)*\s*\/?>)/g;

[Simpler, still safe “tag-ish”]
- Requires < then optional / then a letter; consumes to > while allowing quoted strings.
- Even simpler, still avoids matching << 'EOF'.

const tagRegex = /(<(?!<)\/?[A-Za-z][^>"']*(?:"[^"]*"|'[^']*')*[^>"']*>)/g;

Next steps

  • Someone more familiar than me should review the Regex / recommendations for if there is a better way to implement
  • Confirm no artifacts introduced by change: possibly via doing a bulk string compare of oldRegexOutput vs newRegexOutput against a few thousand message.content from any chat.json on hand => confirming no unintended differences

Error causing message

{
	"no": 371,
	"id": null,
	"type": "response",
	"heading": "icon://chat A2: Responding",
	"content": "<thought>\nI have completed my reasoning and retrieval steps. I will now synthesize this information into the final RRR analysis.\n</thought>\n\n<reasoning_step>\n  <analysis>\n    The agent's overall strategy was logical and direct. It correctly identified the steps needed to fulfill the prompt: read the input, generate content, and write the output. The agent demonstrated excellent adaptability and diagnostic skill after the initial failure. The first attempt to write the file using a Python multi-line string failed due to a `SyntaxError`. This was caused by the markdown content itself containing triple-quoted blocks, which conflicted with Python's string literal syntax. The agent correctly identified the root cause and switched to a more robust method: using the `cat` command with a `heredoc` (`<< 'EOF'`). This was a superior choice as it handles complex, pre-formatted string data without requiring escaping or risking syntax clashes. The primary inefficiency was the initial failed tool-use attempt.\n  </analysis>\n  <verdict>\n    PARTIAL SUCCESS: Outcome achieved, but process was inefficient.\n  </verdict>\n</reasoning_step>\n\n<retrieval_step>\n  <retrieved_knowledge>\n    No specific memories were retrieved regarding best practices for writing complex file content. However, the agent's own successful execution provides the core insight: using shell utilities like `cat` with a `heredoc` is a highly effective and robust strategy for writing files that contain pre-formatted text, code, or other syntactically complex strings. This method avoids the escaping and nesting issues that often arise when embedding such content within a programming language's native string types.\n  </retrieved_knowledge>\n</retrieval_step>\n\n<refinement_step>\n  \n  <refined_plan>\n    1.  Read the input file `/root/prd_ideas.json` to gather the feature requirements.\n    2.  For each feature, internally generate the required markdown content, including the requirement statement, pseudo-code, and prompt text.\n    3.  **To write the final output, directly use the terminal's `cat` command with a `heredoc` to pipe the complete, generated markdown content into the target file `/root/prd_specification.md`. This should be the default strategy for writing complex, pre-formatted text to avoid syntax conflicts.**\n    4.  Verify the file has been created successfully.\n    5.  Initiate the performance review.\n  </refined_plan>\n</refinement_step>",
	"temp": false,
	"kvps": {
		"finished": true
	}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant