-
Notifications
You must be signed in to change notification settings - Fork 2k
fix: braintrust masking was over truncating #5458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR fixes a bug in the Braintrust tracing masking functionality where data was being over-truncated. The original implementation had hardcoded truncation parameters that truncated strings to approximately 1,000 characters (800 head + 200 tail) while claiming to truncate to 10,000 characters.
The fix introduces several improvements:
-
Configurable masking length: Adds a
MASKING_LENGTH
constant set to 20,000 characters, making the truncation limit explicit and easily adjustable -
Improved truncation logic: The new
_truncate_str
function allocates 4/5 of the total length to the head (16,000 chars) and 1/5 to the tail (4,000 chars), providing much more context while maintaining the sandwich approach -
Better code organization: Extracts the masking decision into a separate
_should_mask
function, improving readability and making the logic more maintainable -
Enhanced transparency: The truncation message now clearly shows the original length and target length, making it easier to understand what was truncated
This change fits into Onyx's broader evaluation and tracing infrastructure, specifically the Braintrust integration used for monitoring and debugging LLM interactions. The tracing system helps track performance and behavior across different spans (root, task, score, LLM, function, generic), and proper data preservation is crucial for effective debugging.
Confidence score: 4/5
- This PR is safe to merge with minimal risk as it fixes a clear bug without changing core functionality
- Score reflects a straightforward bug fix with improved constants and better code structure
- No files require special attention - the change is contained to a single utility function
1 file reviewed, 2 comments
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 1 file
Prompt for AI agents (all 1 issues)
Understand the root cause of the following 1 issues and fix them.
<file name="backend/onyx/evals/tracing.py">
<violation number="1" location="backend/onyx/evals/tracing.py:24">
Docstring claims span-aware masking that the function does not implement, which is misleading and can confuse users.</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai
to give feedback, ask questions, or re-run the review.
backend/onyx/evals/tracing.py
Outdated
if len(data_str) > 10_000: | ||
return _truncate_str(data_str) | ||
return data | ||
"""Mask data based on span type. Only mask generic and function spans, not root, task, score, or LLM spans.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring claims span-aware masking that the function does not implement, which is misleading and can confuse users.
Prompt for AI agents
Address the following comment on backend/onyx/evals/tracing.py at line 24:
<comment>Docstring claims span-aware masking that the function does not implement, which is misleading and can confuse users.</comment>
<file context>
@@ -7,18 +7,24 @@
- if len(data_str) > 10_000:
- return _truncate_str(data_str)
- return data
+ """Mask data based on span type. Only mask generic and function spans, not root, task, score, or LLM spans."""
+ if not _should_mask(data):
+ return data
</file context>
"""Mask data based on span type. Only mask generic and function spans, not root, task, score, or LLM spans.""" | |
"""Mask data by length; mask only when len(str(data)) exceeds MASKING_LENGTH.""" |
3 Jobs Failed: Run Integration Tests v2 / integration-tests (tests/connector, tests-connector) failed on "Run Integration Tests for tests-connector"
Run Integration Tests v2 / required failed on "Run actions/github-script@v7"
1 job failed running on non-Blacksmith runners. Summary: 6 successful workflows, 2 failed workflows
Last updated: 2025-09-19 20:40:42 UTC |
Description
was masking to 1000 characters (too aggressive)
How Has This Been Tested?
tested locally
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.
Summary by cubic
Fixes over-aggressive Braintrust tracing masking to preserve useful context while still limiting very large payloads. Increases the mask threshold to 20,000 characters and applies masking only when needed.