Skip to content

Conversation

Orbital-Web
Copy link
Contributor

Description

Greatly overhauled search eval testing script to take in a list of relevant search results (optional). Also more seamlessly generates the modified queries.

Also added question categories and category-based aggregation to see what type of search results are performing poorly.

How Has This Been Tested?

Ran script, manually computed metrics to ensure correctness. Looked at soft truth set to validate that too.

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

@Orbital-Web Orbital-Web requested a review from a team as a code owner May 20, 2025 06:23
Copy link

vercel bot commented May 20, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 21, 2025 6:30pm

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR introduces a comprehensive overhaul of the search evaluation testing framework, adding structured test queries with ground truth validation and category-based analysis.

  • The new test_queries.json.template has a syntax error (missing comma) and incomplete second test query that needs to be fixed
  • search_eval_config.yaml.template reduces NUM_RETURNED_HITS from 200 to 50 and EVAL_TOPK from 20 to 5, which may impact evaluation quality
  • New metrics system in util_eval.py compares against both ground truth and reranked results but needs better edge case handling
  • Modular code structure with separate utility files improves maintainability but some error handling gaps exist
  • Query keyword generation in util_data.py should persist generated keywords to prevent result inconsistency across runs

💡 (5/5) You can turn off certain types of comments like style here!

9 file(s) reviewed, 17 comment(s)
Edit PR Review Bot Settings | Greptile

pass


warned = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Using global state for warning flag could cause issues in concurrent test execution

Copy link
Contributor

@evan-onyx evan-onyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nits!

Merged via the queue into main with commit 9dbe12c May 21, 2025
11 checks passed
@Orbital-Web Orbital-Web deleted the search-quality-docspec branch May 21, 2025 20:21
ferdinandl007 pushed a commit to ferdinandl007/onyx that referenced this pull request May 27, 2025
…query, etc.) (onyx-dot-app#4739)

* fix: autoflake & import order

* docs: readme

* fix: mypy

* feat: eval

* docs: readme

* fix: oops forgot to remove comment

* fix: typo

* fix: rename var

* updated default config

* fix: config issue

* oops

* fix: black

* fix: eval and config

* feat: non tool calling query mod
aronszanto pushed a commit to aronszanto/onyx that referenced this pull request May 27, 2025
…query, etc.) (onyx-dot-app#4739)

* fix: autoflake & import order

* docs: readme

* fix: mypy

* feat: eval

* docs: readme

* fix: oops forgot to remove comment

* fix: typo

* fix: rename var

* updated default config

* fix: config issue

* oops

* fix: black

* fix: eval and config

* feat: non tool calling query mod
ZhipengHe pushed a commit to ZhipengHe/onyx that referenced this pull request Jun 6, 2025
…query, etc.) (onyx-dot-app#4739)

* fix: autoflake & import order

* docs: readme

* fix: mypy

* feat: eval

* docs: readme

* fix: oops forgot to remove comment

* fix: typo

* fix: rename var

* updated default config

* fix: config issue

* oops

* fix: black

* fix: eval and config

* feat: non tool calling query mod
AnkitTukatek pushed a commit to TukaTek/onyx that referenced this pull request Sep 23, 2025
…query, etc.) (onyx-dot-app#4739)

* fix: autoflake & import order

* docs: readme

* fix: mypy

* feat: eval

* docs: readme

* fix: oops forgot to remove comment

* fix: typo

* fix: rename var

* updated default config

* fix: config issue

* oops

* fix: black

* fix: eval and config

* feat: non tool calling query mod
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants