-
Notifications
You must be signed in to change notification settings - Fork 1
Fix structured capture redaction for deeply nested payloads #649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
WalkthroughRefactors the payload redaction logic in structured_wire_capture_service to an iterative, stack-based traversal with cycle detection, preserving behavior for non-collections. Adds a unit test creating deeply nested payloads to validate redaction of API keys and placeholder emission. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Service as StructuredWireCaptureService
participant Redactor as APIKeyRedactor
Client->>Service: capture(payload)
activate Service
Service->>Service: _redact_payload(payload)
note right of Service: Iterative traversal<br/>- Use stack<br/>- Track processed_ids for cycles
alt payload is string
Service->>Redactor: redact(string)
Redactor-->>Service: redacted string
else payload is dict/list
loop for each node on stack
alt dict key/value
Service->>Redactor: redact(value if string)
Redactor-->>Service: redacted value
else list item
Service->>Redactor: redact(item if string)
Redactor-->>Service: redacted item
end
note over Service: Copy scalars as-is
end
else non-collection scalar
Service-->>Service: return value unchanged
end
Service-->>Client: redacted_root
deactivate Service
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/core/services/structured_wire_capture_service.py(1 hunks)tests/unit/core/services/test_structured_wire_capture.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Avoid using emojis in Python code comments or docstrings
Follow PEP 8 and use type hints for all functions
Import order must be: standard library, third-party, then local imports, separated by blank lines
Naming conventions: snake_case for variables and functions; PascalCase for classes
Error handling: use specific exceptions and include meaningful error messages
Prefer f-strings for string formatting
Files:
src/core/services/structured_wire_capture_service.pytests/unit/core/services/test_structured_wire_capture.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Catch specific exceptions; avoid broad except Exception blocks
If a broad exception must be caught, log with exc_info=True and re-raise a specific custom exception
Use the most specific exception class from src.core.common.exceptions that accurately describes the error
Provide clear, helpful error messages and include relevant details in the details dictionary of custom exceptions
Files:
src/core/services/structured_wire_capture_service.py
🧬 Code graph analysis (2)
src/core/services/structured_wire_capture_service.py (1)
src/core/common/logging_utils.py (1)
redact(121-138)
tests/unit/core/services/test_structured_wire_capture.py (2)
src/core/services/structured_wire_capture_service.py (1)
_redact_payload(263-328)src/security.py (1)
APIKeyRedactor(8-61)
🪛 Ruff (0.13.3)
tests/unit/core/services/test_structured_wire_capture.py
372-372: Possible hardcoded password assigned to: "secret"
(S105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test (3.10)
| processed_ids: set[int] = set() | ||
|
|
||
| while stack: | ||
| source, target = stack.pop() | ||
|
|
||
| # Skip already processed containers to avoid infinite loops if we | ||
| # ever encounter cyclic references (these are not expected for JSON | ||
| # payloads but this guard keeps the middleware resilient). | ||
| source_id = id(source) | ||
| if source_id in processed_ids: | ||
| continue | ||
| processed_ids.add(source_id) | ||
|
|
||
| if isinstance(source, dict) and isinstance(target, dict): | ||
| for key, value in source.items(): | ||
| if isinstance(value, str): | ||
| target[key] = self._redactor.redact(value) | ||
| elif isinstance(value, dict): | ||
| child: dict[Any, Any] = {} | ||
| target[key] = child | ||
| stack.append((value, child)) | ||
| elif isinstance(value, list): | ||
| child_list: list[Any] = [] | ||
| target[key] = child_list | ||
| stack.append((value, child_list)) | ||
| else: | ||
| target[key] = value | ||
| elif isinstance(source, list) and isinstance(target, list): | ||
| for item in source: | ||
| if isinstance(item, str): | ||
| target.append(self._redactor.redact(item)) | ||
| elif isinstance(item, dict): | ||
| child = {} | ||
| target.append(child) | ||
| stack.append((item, child)) | ||
| elif isinstance(item, list): | ||
| child_list = [] | ||
| target.append(child_list) | ||
| stack.append((item, child_list)) | ||
| else: | ||
| target.append(item) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve shared container aliases during iteration
When a dict/list appears in multiple places (or in a cycle), the first encounter processes it, but later encounters hit the processed_ids guard and leave the new target container empty. This breaks alias preservation and drops data for legitimate payloads that share structures (e.g., two keys pointing to the same dict). Switching to an id -> redacted_container map lets us reuse the same sanitized object without re-traversing, avoiding both data loss and infinite loops.
- processed_ids: set[int] = set()
+ seen: dict[int, Any] = {id(payload): redacted_root}
while stack:
source, target = stack.pop()
- # Skip already processed containers to avoid infinite loops if we
- # ever encounter cyclic references (these are not expected for JSON
- # payloads but this guard keeps the middleware resilient).
- source_id = id(source)
- if source_id in processed_ids:
- continue
- processed_ids.add(source_id)
-
if isinstance(source, dict) and isinstance(target, dict):
for key, value in source.items():
if isinstance(value, str):
target[key] = self._redactor.redact(value)
elif isinstance(value, dict):
- child: dict[Any, Any] = {}
- target[key] = child
- stack.append((value, child))
+ value_id = id(value)
+ child = seen.get(value_id)
+ if child is None:
+ child = {}
+ seen[value_id] = child
+ stack.append((value, child))
+ target[key] = child
elif isinstance(value, list):
- child_list: list[Any] = []
- target[key] = child_list
- stack.append((value, child_list))
+ value_id = id(value)
+ child_list = seen.get(value_id)
+ if child_list is None:
+ child_list = []
+ seen[value_id] = child_list
+ stack.append((value, child_list))
+ target[key] = child_list
else:
target[key] = value
elif isinstance(source, list) and isinstance(target, list):
for item in source:
if isinstance(item, str):
target.append(self._redactor.redact(item))
elif isinstance(item, dict):
- child = {}
- target.append(child)
- stack.append((item, child))
+ item_id = id(item)
+ child = seen.get(item_id)
+ if child is None:
+ child = {}
+ seen[item_id] = child
+ stack.append((item, child))
+ target.append(child)
elif isinstance(item, list):
- child_list = []
- target.append(child_list)
- stack.append((item, child_list))
+ item_id = id(item)
+ child_list = seen.get(item_id)
+ if child_list is None:
+ child_list = []
+ seen[item_id] = child_list
+ stack.append((item, child_list))
+ target.append(child_list)
else:
target.append(item)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| processed_ids: set[int] = set() | |
| while stack: | |
| source, target = stack.pop() | |
| # Skip already processed containers to avoid infinite loops if we | |
| # ever encounter cyclic references (these are not expected for JSON | |
| # payloads but this guard keeps the middleware resilient). | |
| source_id = id(source) | |
| if source_id in processed_ids: | |
| continue | |
| processed_ids.add(source_id) | |
| if isinstance(source, dict) and isinstance(target, dict): | |
| for key, value in source.items(): | |
| if isinstance(value, str): | |
| target[key] = self._redactor.redact(value) | |
| elif isinstance(value, dict): | |
| child: dict[Any, Any] = {} | |
| target[key] = child | |
| stack.append((value, child)) | |
| elif isinstance(value, list): | |
| child_list: list[Any] = [] | |
| target[key] = child_list | |
| stack.append((value, child_list)) | |
| else: | |
| target[key] = value | |
| elif isinstance(source, list) and isinstance(target, list): | |
| for item in source: | |
| if isinstance(item, str): | |
| target.append(self._redactor.redact(item)) | |
| elif isinstance(item, dict): | |
| child = {} | |
| target.append(child) | |
| stack.append((item, child)) | |
| elif isinstance(item, list): | |
| child_list = [] | |
| target.append(child_list) | |
| stack.append((item, child_list)) | |
| else: | |
| target.append(item) | |
| seen: dict[int, Any] = {id(payload): redacted_root} | |
| while stack: | |
| source, target = stack.pop() | |
| if isinstance(source, dict) and isinstance(target, dict): | |
| for key, value in source.items(): | |
| if isinstance(value, str): | |
| target[key] = self._redactor.redact(value) | |
| elif isinstance(value, dict): | |
| value_id = id(value) | |
| child = seen.get(value_id) | |
| if child is None: | |
| child = {} | |
| seen[value_id] = child | |
| stack.append((value, child)) | |
| target[key] = child | |
| elif isinstance(value, list): | |
| value_id = id(value) | |
| child_list = seen.get(value_id) | |
| if child_list is None: | |
| child_list = [] | |
| seen[value_id] = child_list | |
| stack.append((value, child_list)) | |
| target[key] = child_list | |
| else: | |
| target[key] = value | |
| elif isinstance(source, list) and isinstance(target, list): | |
| for item in source: | |
| if isinstance(item, str): | |
| target.append(self._redactor.redact(item)) | |
| elif isinstance(item, dict): | |
| item_id = id(item) | |
| child = seen.get(item_id) | |
| if child is None: | |
| child = {} | |
| seen[item_id] = child | |
| stack.append((item, child)) | |
| target.append(child) | |
| elif isinstance(item, list): | |
| item_id = id(item) | |
| child_list = seen.get(item_id) | |
| if child_list is None: | |
| child_list = [] | |
| seen[item_id] = child_list | |
| stack.append((item, child_list)) | |
| target.append(child_list) | |
| else: | |
| target.append(item) |
Summary
Testing
https://chatgpt.com/codex/tasks/task_e_68ec26d70c8083338f26e9e0caaa409e
Summary by CodeRabbit
Bug Fixes
Refactor
Tests