-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[OPIK-2469] [BE] Fix duplicate experiment items in dataset comparison #3633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
0e76cd6
bcc09b9
96db251
b69f069
f83875a
55ab470
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -681,14 +681,14 @@ ORDER BY (workspace_id, project_id, id) DESC, last_updated_at DESC | |
| LEFT JOIN feedback_scores_final AS fs ON t.id = fs.entity_id | ||
| LEFT JOIN comments_final AS c ON t.id = c.entity_id | ||
| LEFT JOIN ( | ||
| SELECT | ||
| trace_id, | ||
| SUM(total_estimated_cost) AS total_estimated_cost, | ||
| sumMap(usage) AS usage | ||
| FROM spans final | ||
| WHERE workspace_id = :workspace_id | ||
| AND trace_id IN (SELECT trace_id FROM experiment_items_scope) | ||
| GROUP BY workspace_id, project_id, trace_id | ||
| SELECT | ||
| trace_id, | ||
| SUM(total_estimated_cost) AS total_estimated_cost, | ||
| sumMap(usage) AS usage | ||
| FROM spans final | ||
| WHERE workspace_id = :workspace_id | ||
| AND trace_id IN (SELECT trace_id FROM experiment_items_scope) | ||
| GROUP BY workspace_id, trace_id | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Performance Analysis: GROUP BY ChangeI've analyzed the performance implications of removing Current Fix (Removing project_id):GROUP BY workspace_id, trace_idAlternative Approach (Keeping project_id + groupUniqArray):GROUP BY workspace_id, project_id, trace_id
-- Then use groupUniqArray on line 622 to deduplicatePerformance Comparison:Current Fix (Better Performance):
Alternative Approach (Worse Performance):
Real-World Impact:With cross-project traces (10 traces, 7 with spans in 2 projects):
Performance gain: ~41% fewer rows to process in subsequent joins and aggregations. Conclusion:The current fix is the optimal solution because it:
The performance concern about "many traces and spans" is actually addressed BETTER by this fix because we're reducing the intermediate result set size by ~40% for workspaces with cross-project traces.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not fully convinced about this summary, specially the 40% gain. The best way would be by presenting the objective data, in the form of the query plans before and after the change, particularly the queries were the issue was reported. Anyway, this is more a wish on my end. Not a blocker to continue. |
||
| ) s ON t.id = s.trace_id | ||
| GROUP BY | ||
| t.id, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd recover the previous indentation. It seemed right over the current change.