Skip to content

Conversation

Orbital-Web
Copy link
Contributor

Description

  • Created run_search_eval.py and generate_search_queries.py in onyx/backend/test/regression/search_quality
  • Added other dependencies too

What does it do?

  • run_search_eval.py runs a bunch of queries locally and compares the results from the search and reranker

  • it evaluates the search quality based on how closely it aligns with the reranker (assuming the reranker works well)

  • it is mostly used as a tool for quickly testing and tuning search parameters such as hybrid alpha, decay, etc. Can also be used to test other factors that affect searching, such as the prompt, embedding model, quantization, etc.

  • unlike answer_quality/run_qa, it doesn't need a ground truth label (enables quick and easy testing, and testing of queries without clear "ground truth" orderings)

  • It also makes sure the query doesn't switch around every time (since normally, the query is modified before going into the search pipeline) to enable fair comparisons

  • generate_search_queries is a helper tool to convert queries and save them, so the evaluation script can reuse the same modified queries

How Has This Been Tested?

  • tested in single tenant environment, not intended to be used in multi tenant currently, though could be modified in the future for users to run their own tests on their own documents to tune their parameters
  • tested with different llm and reranker models

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

@Orbital-Web Orbital-Web requested a review from a team as a code owner May 15, 2025 20:18
Copy link

vercel bot commented May 15, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 15, 2025 10:07pm

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR introduces a comprehensive search quality evaluation framework with scripts and configuration files to assess and tune search parameters by comparing search results against reranked results.

  • Added run_search_eval.py implementing metrics like Jaccard similarity and rank changes to evaluate search quality against reranker output
  • Added generate_search_queries.py to ensure consistent query modification across test runs using LLM and search tool interface
  • Implemented score-adjusted evaluation metrics in run_search_eval.py to handle varying relevance thresholds
  • Added detailed configuration templates (search_eval_config.yaml.template) for customizing search and evaluation parameters
  • Enhanced error handling in /backend/onyx/context/search/utils.py for stop word removal and logging

💡 (2/5) Greptile learns from your feedback when you react with 👍/👎!

6 file(s) reviewed, 10 comment(s)
Edit PR Review Bot Settings | Greptile

Orbital-Web and others added 2 commits May 15, 2025 13:32
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Copy link
Contributor

@evan-onyx evan-onyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, LGTM!

@Orbital-Web Orbital-Web added this pull request to the merge queue May 15, 2025
Merged via the queue into main with commit 30d9ce1 May 16, 2025
11 checks passed
@Orbital-Web Orbital-Web deleted the tests/search-quality-eval branch May 16, 2025 00:49
ferdinandl007 pushed a commit to ferdinandl007/onyx that referenced this pull request May 19, 2025
* fix: import order

* test examples

* fix: import

* wip: reranker based eval

* fix: import order

* feat: adjuted score

* fix: mypy

* fix: suggestions

* sorry cvs, you must go

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix: mypy

* fix: suggestions

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
aronszanto pushed a commit to aronszanto/onyx that referenced this pull request May 27, 2025
* fix: import order

* test examples

* fix: import

* wip: reranker based eval

* fix: import order

* feat: adjuted score

* fix: mypy

* fix: suggestions

* sorry cvs, you must go

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix: mypy

* fix: suggestions

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
ZhipengHe pushed a commit to ZhipengHe/onyx that referenced this pull request Jun 6, 2025
* fix: import order

* test examples

* fix: import

* wip: reranker based eval

* fix: import order

* feat: adjuted score

* fix: mypy

* fix: suggestions

* sorry cvs, you must go

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix: mypy

* fix: suggestions

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
AnkitTukatek pushed a commit to TukaTek/onyx that referenced this pull request Sep 23, 2025
* fix: import order

* test examples

* fix: import

* wip: reranker based eval

* fix: import order

* feat: adjuted score

* fix: mypy

* fix: suggestions

* sorry cvs, you must go

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix: mypy

* fix: suggestions

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants