Feat: Search Eval Testing Overhaul (provide ground truth, categorize query, etc.) #4739

Orbital-Web · 2025-05-20T06:23:21Z

Description

Greatly overhauled search eval testing script to take in a list of relevant search results (optional). Also more seamlessly generates the modified queries.

Also added question categories and category-based aggregation to see what type of search results are performing poorly.

How Has This Been Tested?

Ran script, manually computed metrics to ensure correctness. Looked at soft truth set to validate that too.

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

This PR should be backported (make sure to check that the backport attempt succeeds)
[Optional] Override Linear Check

vercel · 2025-05-20T06:23:26Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
internal-search	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 21, 2025 6:30pm

greptile-apps

PR Summary

This PR introduces a comprehensive overhaul of the search evaluation testing framework, adding structured test queries with ground truth validation and category-based analysis.

The new test_queries.json.template has a syntax error (missing comma) and incomplete second test query that needs to be fixed
search_eval_config.yaml.template reduces NUM_RETURNED_HITS from 200 to 50 and EVAL_TOPK from 20 to 5, which may impact evaluation quality
New metrics system in util_eval.py compares against both ground truth and reranked results but needs better edge case handling
Modular code structure with separate utility files improves maintainability but some error handling gaps exist
Query keyword generation in util_data.py should persist generated keywords to prevent result inconsistency across runs

_{💡 (5/5) You can turn off certain types of comments like style here!}

_{9 file(s) reviewed, 17 comment(s)}
_{Edit PR Review Bot Settings | Greptile}

backend/tests/regression/search_quality/run_search_eval.py

backend/tests/regression/search_quality/README.md

backend/tests/regression/search_quality/test_queries.json.template

greptile-apps · 2025-05-20T06:25:18Z

backend/tests/regression/search_quality/util_data.py

+        pass
+
+
+warned = False


style: Using global state for warning flag could cause issues in concurrent test execution

backend/tests/regression/search_quality/util_retrieve.py

evan-onyx

some nits!

backend/tests/regression/search_quality/util_data.py

backend/tests/regression/search_quality/util_eval.py

backend/tests/regression/search_quality/util_retrieve.py

…query, etc.) (onyx-dot-app#4739) * fix: autoflake & import order * docs: readme * fix: mypy * feat: eval * docs: readme * fix: oops forgot to remove comment * fix: typo * fix: rename var * updated default config * fix: config issue * oops * fix: black * fix: eval and config * feat: non tool calling query mod

Orbital-Web added 5 commits May 19, 2025 17:52

fix: autoflake & import order

24cac77

docs: readme

a537d61

fix: mypy

f09d028

feat: eval

7a55d6e

docs: readme

fd25bd8

Orbital-Web requested review from evan-onyx and joachim-danswer May 20, 2025 06:23

Orbital-Web requested a review from a team as a code owner May 20, 2025 06:23

fix: oops forgot to remove comment

bd93e15

greptile-apps bot reviewed May 20, 2025

View reviewed changes

vercel bot deployed to Preview May 20, 2025 06:29 View deployment

fix: typo

24253be

vercel bot deployed to Preview May 20, 2025 06:39 View deployment

fix: rename var

a4a5071

vercel bot deployed to Preview May 20, 2025 16:07 View deployment

updated default config

1f5f1d6

vercel bot deployed to Preview May 20, 2025 18:15 View deployment

fix: config issue

68d51df

vercel bot deployed to Preview May 20, 2025 18:18 View deployment

oops

c911c7d

vercel bot deployed to Preview May 20, 2025 18:22 View deployment

fix: black

74850c0

vercel bot deployed to Preview May 20, 2025 18:26 View deployment

evan-onyx approved these changes May 20, 2025

View reviewed changes

backend/tests/regression/search_quality/util_data.py Outdated Show resolved Hide resolved

backend/tests/regression/search_quality/util_eval.py Outdated Show resolved Hide resolved

backend/tests/regression/search_quality/util_retrieve.py Outdated Show resolved Hide resolved

Orbital-Web added 2 commits May 21, 2025 11:02

fix: eval and config

4f0b7fd

feat: non tool calling query mod

1e2a4b3

vercel bot deployed to Preview May 21, 2025 18:30 View deployment

Orbital-Web added this pull request to the merge queue May 21, 2025

Merged via the queue into main with commit 9dbe12c May 21, 2025
11 checks passed

Orbital-Web deleted the search-quality-docspec branch May 21, 2025 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: Search Eval Testing Overhaul (provide ground truth, categorize query, etc.) #4739

Feat: Search Eval Testing Overhaul (provide ground truth, categorize query, etc.) #4739

Uh oh!

Orbital-Web commented May 20, 2025

Uh oh!

vercel bot commented May 20, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot May 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

evan-onyx left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Feat: Search Eval Testing Overhaul (provide ground truth, categorize query, etc.) #4739

Feat: Search Eval Testing Overhaul (provide ground truth, categorize query, etc.) #4739

Uh oh!

Conversation

Orbital-Web commented May 20, 2025

Description

How Has This Been Tested?

Backporting (check the box to trigger backport action)

Uh oh!

vercel bot commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

PR Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot May 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

evan-onyx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vercel bot commented May 20, 2025 •

edited

Loading