Skip to content

Conversation

@matdev83
Copy link
Owner

@matdev83 matdev83 commented Oct 12, 2025

Summary

  • replace the recursive redaction helper in the structured wire capture service with an iterative stack-based traversal so extremely deep payloads can no longer trigger RecursionError crashes
  • add a regression test that builds a >1000-level nested payload and verifies the redaction keeps the proxy safe from API key leaks

Testing

  • python -m pytest -o addopts="" tests/unit/core/services/test_structured_wire_capture.py
  • python -m pytest -o addopts=""

https://chatgpt.com/codex/tasks/task_e_68ec26d70c8083338f26e9e0caaa409e

Summary by CodeRabbit

  • Bug Fixes

    • Improved sensitive data redaction to safely handle deeply nested or cyclic data without errors, preventing crashes and ensuring consistent placeholder output while preserving non-sensitive fields.
  • Refactor

    • Reworked redaction logic to an iterative approach with cycle detection, enhancing robustness and stability for complex payloads.
  • Tests

    • Added comprehensive tests for deeply nested structures to verify reliable API key redaction and correct placeholder behavior.

@coderabbitai
Copy link

coderabbitai bot commented Oct 12, 2025

Walkthrough

Refactors the payload redaction logic in structured_wire_capture_service to an iterative, stack-based traversal with cycle detection, preserving behavior for non-collections. Adds a unit test creating deeply nested payloads to validate redaction of API keys and placeholder emission.

Changes

Cohort / File(s) Summary of Edits
Core redaction traversal refactor
src/core/services/structured_wire_capture_service.py
Converted _redact_payload from recursive to iterative using an explicit stack; added cycle detection via processed_ids; pre-check and handling for string payloads; deep-copy redaction for dicts/lists; expanded docstrings and inline comments; maintained passthrough for non-collection scalars.
Unit tests for deep nesting
tests/unit/core/services/test_structured_wire_capture.py
Added helpers to build deeply nested structures; imported Any and APIKeyRedactor; introduced test to ensure redaction works on deep nests and produces the "(API_KEY_HAS_BEEN_REDACTED)" placeholder.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Service as StructuredWireCaptureService
  participant Redactor as APIKeyRedactor

  Client->>Service: capture(payload)
  activate Service
  Service->>Service: _redact_payload(payload)
  note right of Service: Iterative traversal<br/>- Use stack<br/>- Track processed_ids for cycles

  alt payload is string
    Service->>Redactor: redact(string)
    Redactor-->>Service: redacted string
  else payload is dict/list
    loop for each node on stack
      alt dict key/value
        Service->>Redactor: redact(value if string)
        Redactor-->>Service: redacted value
      else list item
        Service->>Redactor: redact(item if string)
        Redactor-->>Service: redacted item
      end
      note over Service: Copy scalars as-is
    end
  else non-collection scalar
    Service-->>Service: return value unchanged
  end

  Service-->>Client: redacted_root
  deactivate Service
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I hop through stacks, not down a hole,
Chasing keys in a leafy scroll.
Loops and sets keep cycles tame,
Secrets blurred, none left to name.
Thump-thump! The payload’s clean and neat—
A burrow of bytes, now safe to meet. 🐇🔐

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and concisely summarizes the key fix by indicating that the structured capture redaction now supports deeply nested payloads. It directly reflects the iterative, stack-based refactoring of the redaction helper and conveys the primary change in a single, specific sentence.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/fix-remote-attack-vector-bug

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd216cf and d15d901.

📒 Files selected for processing (2)
  • src/core/services/structured_wire_capture_service.py (1 hunks)
  • tests/unit/core/services/test_structured_wire_capture.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Avoid using emojis in Python code comments or docstrings
Follow PEP 8 and use type hints for all functions
Import order must be: standard library, third-party, then local imports, separated by blank lines
Naming conventions: snake_case for variables and functions; PascalCase for classes
Error handling: use specific exceptions and include meaningful error messages
Prefer f-strings for string formatting

Files:

  • src/core/services/structured_wire_capture_service.py
  • tests/unit/core/services/test_structured_wire_capture.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Catch specific exceptions; avoid broad except Exception blocks
If a broad exception must be caught, log with exc_info=True and re-raise a specific custom exception
Use the most specific exception class from src.core.common.exceptions that accurately describes the error
Provide clear, helpful error messages and include relevant details in the details dictionary of custom exceptions

Files:

  • src/core/services/structured_wire_capture_service.py
🧬 Code graph analysis (2)
src/core/services/structured_wire_capture_service.py (1)
src/core/common/logging_utils.py (1)
  • redact (121-138)
tests/unit/core/services/test_structured_wire_capture.py (2)
src/core/services/structured_wire_capture_service.py (1)
  • _redact_payload (263-328)
src/security.py (1)
  • APIKeyRedactor (8-61)
🪛 Ruff (0.13.3)
tests/unit/core/services/test_structured_wire_capture.py

372-372: Possible hardcoded password assigned to: "secret"

(S105)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test (3.10)

Comment on lines +286 to +327
processed_ids: set[int] = set()

while stack:
source, target = stack.pop()

# Skip already processed containers to avoid infinite loops if we
# ever encounter cyclic references (these are not expected for JSON
# payloads but this guard keeps the middleware resilient).
source_id = id(source)
if source_id in processed_ids:
continue
processed_ids.add(source_id)

if isinstance(source, dict) and isinstance(target, dict):
for key, value in source.items():
if isinstance(value, str):
target[key] = self._redactor.redact(value)
elif isinstance(value, dict):
child: dict[Any, Any] = {}
target[key] = child
stack.append((value, child))
elif isinstance(value, list):
child_list: list[Any] = []
target[key] = child_list
stack.append((value, child_list))
else:
target[key] = value
elif isinstance(source, list) and isinstance(target, list):
for item in source:
if isinstance(item, str):
target.append(self._redactor.redact(item))
elif isinstance(item, dict):
child = {}
target.append(child)
stack.append((item, child))
elif isinstance(item, list):
child_list = []
target.append(child_list)
stack.append((item, child_list))
else:
target.append(item)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve shared container aliases during iteration

When a dict/list appears in multiple places (or in a cycle), the first encounter processes it, but later encounters hit the processed_ids guard and leave the new target container empty. This breaks alias preservation and drops data for legitimate payloads that share structures (e.g., two keys pointing to the same dict). Switching to an id -> redacted_container map lets us reuse the same sanitized object without re-traversing, avoiding both data loss and infinite loops.

-        processed_ids: set[int] = set()
+        seen: dict[int, Any] = {id(payload): redacted_root}
 
         while stack:
             source, target = stack.pop()
 
-            # Skip already processed containers to avoid infinite loops if we
-            # ever encounter cyclic references (these are not expected for JSON
-            # payloads but this guard keeps the middleware resilient).
-            source_id = id(source)
-            if source_id in processed_ids:
-                continue
-            processed_ids.add(source_id)
-
             if isinstance(source, dict) and isinstance(target, dict):
                 for key, value in source.items():
                     if isinstance(value, str):
                         target[key] = self._redactor.redact(value)
                     elif isinstance(value, dict):
-                        child: dict[Any, Any] = {}
-                        target[key] = child
-                        stack.append((value, child))
+                        value_id = id(value)
+                        child = seen.get(value_id)
+                        if child is None:
+                            child = {}
+                            seen[value_id] = child
+                            stack.append((value, child))
+                        target[key] = child
                     elif isinstance(value, list):
-                        child_list: list[Any] = []
-                        target[key] = child_list
-                        stack.append((value, child_list))
+                        value_id = id(value)
+                        child_list = seen.get(value_id)
+                        if child_list is None:
+                            child_list = []
+                            seen[value_id] = child_list
+                            stack.append((value, child_list))
+                        target[key] = child_list
                     else:
                         target[key] = value
             elif isinstance(source, list) and isinstance(target, list):
                 for item in source:
                     if isinstance(item, str):
                         target.append(self._redactor.redact(item))
                     elif isinstance(item, dict):
-                        child = {}
-                        target.append(child)
-                        stack.append((item, child))
+                        item_id = id(item)
+                        child = seen.get(item_id)
+                        if child is None:
+                            child = {}
+                            seen[item_id] = child
+                            stack.append((item, child))
+                        target.append(child)
                     elif isinstance(item, list):
-                        child_list = []
-                        target.append(child_list)
-                        stack.append((item, child_list))
+                        item_id = id(item)
+                        child_list = seen.get(item_id)
+                        if child_list is None:
+                            child_list = []
+                            seen[item_id] = child_list
+                            stack.append((item, child_list))
+                        target.append(child_list)
                     else:
                         target.append(item)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
processed_ids: set[int] = set()
while stack:
source, target = stack.pop()
# Skip already processed containers to avoid infinite loops if we
# ever encounter cyclic references (these are not expected for JSON
# payloads but this guard keeps the middleware resilient).
source_id = id(source)
if source_id in processed_ids:
continue
processed_ids.add(source_id)
if isinstance(source, dict) and isinstance(target, dict):
for key, value in source.items():
if isinstance(value, str):
target[key] = self._redactor.redact(value)
elif isinstance(value, dict):
child: dict[Any, Any] = {}
target[key] = child
stack.append((value, child))
elif isinstance(value, list):
child_list: list[Any] = []
target[key] = child_list
stack.append((value, child_list))
else:
target[key] = value
elif isinstance(source, list) and isinstance(target, list):
for item in source:
if isinstance(item, str):
target.append(self._redactor.redact(item))
elif isinstance(item, dict):
child = {}
target.append(child)
stack.append((item, child))
elif isinstance(item, list):
child_list = []
target.append(child_list)
stack.append((item, child_list))
else:
target.append(item)
seen: dict[int, Any] = {id(payload): redacted_root}
while stack:
source, target = stack.pop()
if isinstance(source, dict) and isinstance(target, dict):
for key, value in source.items():
if isinstance(value, str):
target[key] = self._redactor.redact(value)
elif isinstance(value, dict):
value_id = id(value)
child = seen.get(value_id)
if child is None:
child = {}
seen[value_id] = child
stack.append((value, child))
target[key] = child
elif isinstance(value, list):
value_id = id(value)
child_list = seen.get(value_id)
if child_list is None:
child_list = []
seen[value_id] = child_list
stack.append((value, child_list))
target[key] = child_list
else:
target[key] = value
elif isinstance(source, list) and isinstance(target, list):
for item in source:
if isinstance(item, str):
target.append(self._redactor.redact(item))
elif isinstance(item, dict):
item_id = id(item)
child = seen.get(item_id)
if child is None:
child = {}
seen[item_id] = child
stack.append((item, child))
target.append(child)
elif isinstance(item, list):
item_id = id(item)
child_list = seen.get(item_id)
if child_list is None:
child_list = []
seen[item_id] = child_list
stack.append((item, child_list))
target.append(child_list)
else:
target.append(item)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant