Skip to content

Conversation

rguan72
Copy link
Contributor

@rguan72 rguan72 commented Sep 17, 2025

Description

[Provide a brief description of the changes in this PR]

How Has This Been Tested?

[Describe the tests you ran to verify your changes]

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

Summary by cubic

Adds a Braintrust-backed evaluation pipeline with a local CLI, a secure EE API endpoint, and a Celery task to run evals in CI/CD. Enables running evals on local data or Braintrust datasets without DB side effects.

  • New Features

    • Eval framework (backend/onyx/evals): configuration models, provider abstraction, Braintrust provider, and isolated transactional DB sessions.
    • EE API: POST /api/evals/eval_run (cloud superuser only) enqueues eval_run_task via Celery.
    • Celery task: eval_run_task executes evaluations using EvalConfigurationOptions.
    • CLI: backend/onyx/evals/eval_cli.py supports local runs or remote triggers; VS Code launch “Eval CLI” added.
    • Docs: backend/onyx/evals/README.md; script to create Braintrust datasets.
    • Chat: prepare_chat_message_request now supports llm_override and allowed_tool_ids for eval control.
    • Config: BRAINTRUST_* envs and max concurrency; .gitignore updated for eval data.
  • Dependencies

    • Adds braintrust, braintrust-langchain, autoevals to backend requirements.
    • Celery app loads onyx.background.celery.tasks.evals.

@rguan72 rguan72 requested a review from a team as a code owner September 17, 2025 17:03
Copy link

vercel bot commented Sep 17, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
internal-search Ready Ready Preview Comment Sep 17, 2025 6:26pm

@rguan72 rguan72 changed the title Eval Pipeline 2 for ci/cd fix feat: Eval Pipeline 2 for ci/cd fix Sep 17, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements a comprehensive evaluation pipeline system for Onyx, enabling automated testing and quality assessment of chat and retrieval functionality. The changes introduce a Braintrust-based evaluation framework that supports both local development testing and remote CI/CD integration.

The core implementation includes:

  • Evaluation Infrastructure: New models (EvalConfiguration, EvalConfigurationOptions) define evaluation settings including persona overrides, tool restrictions, and LLM configurations
  • Provider Architecture: Abstract EvalProvider interface with concrete BraintrustEvalProvider implementation for running evaluations against datasets
  • Database Isolation: Ephemeral session factory ensures evaluations don't affect production data by rolling back all database changes
  • CLI Integration: New eval_cli.py provides command-line tools for running evaluations locally or triggering remote execution via API
  • Background Processing: Celery task integration allows evaluations to run asynchronously without blocking API responses
  • Enterprise API: New /evals/eval_run endpoint restricted to cloud superusers for triggering evaluation jobs

The system integrates deeply with existing Onyx components by extending chat message request preparation to support LLM overrides and tool restrictions, updating the Celery autodiscovery to include evaluation tasks, and modifying built-in tool type definitions for better evaluation compatibility. Configuration management is enhanced with new Braintrust-specific environment variables for API keys, project names, and concurrency limits.

The evaluation pipeline supports different research types (DEEP/THOUGHTFUL) and can work with both local JSON datasets and remote Braintrust datasets, providing flexibility for various testing scenarios from development to production monitoring.

Confidence score: 2/5

  • This PR has several critical issues that make it unsafe to merge without significant fixes
  • Score lowered due to incomplete implementations, missing configurations, potential type safety issues, and debugging code left in production files
  • Pay close attention to backend/onyx/evals/providers/braintrust.py, backend/onyx/tools/built_in_tools.py, and backend/onyx/evals/models.py - these files contain the most problematic code that needs immediate attention

Context used:

Rule - Remove temporary debugging code before merging to production, especially tenant-specific debugging logs. (link)

26 files reviewed, 17 comments

Edit Code Review Bot Settings | Greptile

Comment on lines +182 to +183
dataset_size = len(list(dataset.fetch()))
print(f"Dataset size: {dataset_size}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: inefficient dataset size calculation - list(dataset.fetch()) loads entire dataset into memory just to count items

Comment on lines +199 to +200
args.remote_dataset_name,
search_permissions_email=args.search_permissions_email,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: args.remote_dataset_name could be None when using --local-data-path, causing TypeError

Suggested change
args.remote_dataset_name,
search_permissions_email=args.search_permissions_email,
if not args.remote_dataset_name:
print("Error: --remote-dataset-name is required for remote evaluation")
return
try:
result = run_remote(
args.base_url,
api_key,
args.remote_dataset_name,
search_permissions_email=args.search_permissions_email,
)

else None
)
research_type = ResearchType(eval_input.get("research_type", "THOUGHTFUL"))
print(eval_input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Remove debugging print statement before merging to production

Suggested change
print(eval_input)
research_type = ResearchType(eval_input.get("research_type", "THOUGHTFUL"))

Context Used: Rule - Remove temporary debugging code before merging to production, especially tenant-specific debugging logs. (link)

persona_override_config: PersonaOverrideConfig | None = None
llm: LLMOverride = Field(default_factory=LLMOverride)
search_permissions_email: str | None = None
allowed_tool_ids: list[int]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Missing default value for required field. This will cause Pydantic validation errors when creating EvalConfiguration instances without explicit allowed_tool_ids.

Suggested change
allowed_tool_ids: list[int]
allowed_tool_ids: list[int] = Field(default_factory=list)

)


class EvalationAck(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Class name has typo: 'EvalationAck' should be 'EvaluationAck'

Suggested change
class EvalationAck(BaseModel):
class EvaluationAck(BaseModel):

Comment on lines +25 to +26
make sure your AUTH_TYPE=disabled when running evals locally. Save the env var ONYX_EVAL_API_KEY in your .env file so you don't
have to specify it every time for triggering remote runs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Missing punctuation and formatting issues in the sentence about AUTH_TYPE and environment variables

Comment on lines +33 to +35
except Exception:
logger.error("Failed to run eval task")
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Generic exception handling loses valuable error context. Consider logging the exception details with logger.exception() or including specific error information.

Suggested change
except Exception:
logger.error("Failed to run eval task")
raise
except Exception:
logger.exception("Failed to run eval task")
raise

Comment on lines +31 to +38
Eval(
name=BRAINTRUST_PROJECT,
data=eval_data,
task=task,
scores=[],
metadata={**configuration.model_dump()},
max_concurrency=BRAINTRUST_MAX_CONCURRENCY,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The Eval() call result is not captured or awaited. This means the evaluation runs but results are lost and there's no error handling.

Suggested change
Eval(
name=BRAINTRUST_PROJECT,
data=eval_data,
task=task,
scores=[],
metadata={**configuration.model_dump()},
max_concurrency=BRAINTRUST_MAX_CONCURRENCY,
)
eval_result = Eval(
name=BRAINTRUST_PROJECT,
data=eval_data,
task=task,
scores=[],
metadata={**configuration.model_dump()},
max_concurrency=BRAINTRUST_MAX_CONCURRENCY,
)

Comment on lines +47 to +54
Eval(
name=BRAINTRUST_PROJECT,
data=eval_cases,
task=task,
scores=[],
metadata={**configuration.model_dump()},
max_concurrency=BRAINTRUST_MAX_CONCURRENCY,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Same issue - Eval() result is not captured. Both evaluation paths have this critical problem.

Suggested change
Eval(
name=BRAINTRUST_PROJECT,
data=eval_cases,
task=task,
scores=[],
metadata={**configuration.model_dump()},
max_concurrency=BRAINTRUST_MAX_CONCURRENCY,
)
eval_result = Eval(
name=BRAINTRUST_PROJECT,
data=eval_cases,
task=task,
scores=[],
metadata={**configuration.model_dump()},
max_concurrency=BRAINTRUST_MAX_CONCURRENCY,
)

Comment on lines +40 to +43
if data is None:
raise ValueError(
"Must specify data when remote_dataset_name is not specified"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: This validation check is redundant - if remote_dataset_name is None, we're already in the else branch, so data cannot also be None due to the earlier validation.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

28 issues found across 27 files

Prompt for AI agents (all 28 issues)

Understand the root cause of the following 28 issues and fix them.


<file name="backend/onyx/evals/provider.py">

<violation number="1" location="backend/onyx/evals/provider.py:2">
Hard dependency on optional braintrust package causes import-time crash (ModuleNotFoundError)</violation>
</file>

<file name="backend/onyx/db/tools.py">

<violation number="1" location="backend/onyx/db/tools.py:12">
Top-level import introduces a tools-layer runtime dependency in the DB module, undermining the local-import pattern meant to avoid circular imports and tight coupling.</violation>
</file>

<file name="backend/onyx/evals/one_off/create_braintrust_dataset.py">

<violation number="1" location="backend/onyx/evals/one_off/create_braintrust_dataset.py:63">
Filtering does not enforce categories containing &#39;web-only&#39; as stated; condition only checks should_use and question, leading to incorrect dataset contents.</violation>

<violation number="2" location="backend/onyx/evals/one_off/create_braintrust_dataset.py:69">
Typo in user-facing string: &quot;quesiton&quot; should be &quot;question&quot;.</violation>

<violation number="3" location="backend/onyx/evals/one_off/create_braintrust_dataset.py:143">
Avoid user-specific absolute path as default for --csv-path; use a repo-relative path derived from __file__ for portability.</violation>
</file>

<file name="backend/onyx/evals/models.py">

<violation number="1" location="backend/onyx/evals/models.py:45">
Redundant DB lookups in list comprehensions; cache tool IDs to avoid calling get_builtin_tool twice per tool.</violation>

<violation number="2" location="backend/onyx/evals/models.py:69">
Typo in class name &quot;EvalationAck&quot;; consider renaming to &quot;EvaluationAck&quot; for clarity and consistency across the codebase.</violation>

<violation number="3" location="backend/onyx/evals/models.py:77">
Callable type for task may be inconsistent with data item structure; consider aligning the task signature with the data element type (e.g., Callable[[dict[str, dict[str, str]]], str]) or adjusting data typing.</violation>
</file>

<file name="backend/ee/onyx/server/evals/api.py">

<violation number="1" location="backend/ee/onyx/server/evals/api.py:5">
Use the versioned Celery client app import for consistency and correct EE/OSS wiring.</violation>

<violation number="2" location="backend/ee/onyx/server/evals/api.py:24">
Docstring claims API key auth, but the dependency enforces superuser access; update the description to reflect actual auth.</violation>
</file>

<file name="backend/onyx/evals/eval.py">

<violation number="1" location="backend/onyx/evals/eval.py:75">
Avoid printing potentially sensitive eval_input; use structured logging with redaction or remove debug output.</violation>
</file>

<file name="backend/onyx/evals/eval_cli.py">

<violation number="1" location="backend/onyx/evals/eval_cli.py:114">
Network request lacks a timeout; add a finite timeout to avoid indefinite hangs and improve resiliency.</violation>

<violation number="2" location="backend/onyx/evals/eval_cli.py:188">
Empty API key fallback results in sending an empty Bearer token; validate and fail fast when no API key is provided.</violation>

<violation number="3" location="backend/onyx/evals/eval_cli.py:199">
run_remote is invoked with possibly None dataset_name/search_permissions_email, resulting in JSON nulls and server-side failure.</violation>

<violation number="4" location="backend/onyx/evals/eval_cli.py:204">
Printing the raw exception may leak sensitive details; prefer logging only safe metadata (e.g., HTTP status code) or a generic message.

(Based on your team&#39;s feedback about avoiding logging raw exception strings that may contain URLs with temporary auth tokens.)</violation>
</file>

<file name="backend/onyx/evals/providers/braintrust.py">

<violation number="1" location="backend/onyx/evals/providers/braintrust.py:32">
Eval name uses BRAINTRUST_PROJECT (project) instead of configuration.dataset_name; this likely mislabels eval runs and leaves dataset_name unused.</violation>
</file>

<file name="backend/requirements/default.txt">

<violation number="1" location="backend/requirements/default.txt:108">
New dependency appears unused across the repository; consider avoiding adding unused runtime deps or move to a dev/CI-only requirements file to reduce image size and attack surface.</violation>

<violation number="2" location="backend/requirements/default.txt:109">
Integration package appears unused; adding unused deps increases maintenance burden. Consider deferring until referenced in code or include only in eval/CI requirements.</violation>

<violation number="3" location="backend/requirements/default.txt:110">
Added dependency seems unused; avoid introducing unused runtime packages or move to a dedicated eval/CI requirements file.</violation>
</file>

<file name="backend/onyx/configs/app_configs.py">

<violation number="1" location="backend/onyx/configs/app_configs.py:697">
Zero/negative concurrency allowed; clamp to a minimum of 1 to avoid disabling evaluations or runtime errors.</violation>
</file>

<file name="backend/onyx/evals/README.md">

<violation number="1" location="backend/onyx/evals/README.md:17">
Duplicate --remote flag in example command; remove the redundancy.</violation>

<violation number="2" location="backend/onyx/evals/README.md:23">
Incorrect flag name: use --local-data-path (not --local-dataset-path).</violation>

<violation number="3" location="backend/onyx/evals/README.md:39">
VS Code configuration &quot;Eval CLI&quot; does not exist in the repo templates; instruction is misleading.</violation>

<violation number="4" location="backend/onyx/evals/README.md:43">
Claims a default data file path that does not exist; remove or correct the path/default description.</violation>

<violation number="5" location="backend/onyx/evals/README.md:49">
Incorrect default documented for --local-data-path; the CLI sets no default.</violation>

<violation number="6" location="backend/onyx/evals/README.md:51">
Misleading: --braintrust-project does not override provider’s BRAINTRUST_PROJECT; it’s used only for dataset lookup.</violation>

<violation number="7" location="backend/onyx/evals/README.md:56">
Path `evals/data/data.json` does not exist in the repo; reference should not promise a fixed path.</violation>

<violation number="8" location="backend/onyx/evals/README.md:58">
Incorrect data shape: research_type belongs inside input, not at the top level.</violation>
</file>

React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.

@@ -0,0 +1,6 @@
from onyx.evals.models import EvalProvider
from onyx.evals.providers.braintrust import BraintrustEvalProvider
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard dependency on optional braintrust package causes import-time crash (ModuleNotFoundError)

Prompt for AI agents
Address the following comment on backend/onyx/evals/provider.py at line 2:

<comment>Hard dependency on optional braintrust package causes import-time crash (ModuleNotFoundError)</comment>

<file context>
@@ -0,0 +1,6 @@
+from onyx.evals.models import EvalProvider
+from onyx.evals.providers.braintrust import BraintrustEvalProvider
+
+
</file context>
Fix with Cubic


from onyx.db.models import Tool
from onyx.server.features.tool.models import Header
from onyx.tools.built_in_tools import BUILT_IN_TOOL_TYPES
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top-level import introduces a tools-layer runtime dependency in the DB module, undermining the local-import pattern meant to avoid circular imports and tight coupling.

Prompt for AI agents
Address the following comment on backend/onyx/db/tools.py at line 12:

<comment>Top-level import introduces a tools-layer runtime dependency in the DB module, undermining the local-import pattern meant to avoid circular imports and tight coupling.</comment>

<file context>
@@ -2,29 +2,19 @@
 
 from onyx.db.models import Tool
 from onyx.server.features.tool.models import Header
+from onyx.tools.built_in_tools import BUILT_IN_TOOL_TYPES
 from onyx.utils.headers import HeaderItemDict
 from onyx.utils.logger import setup_logger
</file context>
Fix with Cubic

)
parser.add_argument(
"--csv-path",
default="/Users/richardguan/onyx/backend/onyx/evals/data/DR Master Question & Metric Sheet - Sheet1.csv",
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid user-specific absolute path as default for --csv-path; use a repo-relative path derived from file for portability.

Prompt for AI agents
Address the following comment on backend/onyx/evals/one_off/create_braintrust_dataset.py at line 143:

<comment>Avoid user-specific absolute path as default for --csv-path; use a repo-relative path derived from __file__ for portability.</comment>

<file context>
@@ -0,0 +1,183 @@
+    )
+    parser.add_argument(
+        &quot;--csv-path&quot;,
+        default=&quot;/Users/richardguan/onyx/backend/onyx/evals/data/DR Master Question &amp; Metric Sheet - Sheet1.csv&quot;,
+        help=&quot;Path to the CSV file (default: %(default)s)&quot;,
+    )
</file context>
Fix with Cubic

[
{
"question": question
+ ". All info is contained in the quesiton. DO NOT ask any clarifying questions.",
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in user-facing string: "quesiton" should be "question".

Prompt for AI agents
Address the following comment on backend/onyx/evals/one_off/create_braintrust_dataset.py at line 69:

<comment>Typo in user-facing string: &quot;quesiton&quot; should be &quot;question&quot;.</comment>

<file context>
@@ -0,0 +1,183 @@
+                    [
+                        {
+                            &quot;question&quot;: question
+                            + &quot;. All info is contained in the quesiton. DO NOT ask any clarifying questions.&quot;,
+                            &quot;research_type&quot;: &quot;DEEP&quot;,
+                            &quot;categories&quot;: categories,
</file context>
Fix with Cubic

categories = row[12].strip() if len(row) > 12 else ""

# Filter records: should_use = TRUE and categories contains "web-only"
if should_use == "TRUE" and question: # Ensure question is not empty
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering does not enforce categories containing 'web-only' as stated; condition only checks should_use and question, leading to incorrect dataset contents.

Prompt for AI agents
Address the following comment on backend/onyx/evals/one_off/create_braintrust_dataset.py at line 63:

<comment>Filtering does not enforce categories containing &#39;web-only&#39; as stated; condition only checks should_use and question, leading to incorrect dataset contents.</comment>

<file context>
@@ -0,0 +1,183 @@
+            categories = row[12].strip() if len(row) &gt; 12 else &quot;&quot;
+
+            # Filter records: should_use = TRUE and categories contains &quot;web-only&quot;
+            if should_use == &quot;TRUE&quot; and question:  # Ensure question is not empty
+
+                records.extend(
</file context>
Fix with Cubic

4. Click the play button or press F5

This will run the evaluation with the following default settings:
- Uses the local data file at `evals/data/data.json`
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claims a default data file path that does not exist; remove or correct the path/default description.

Prompt for AI agents
Address the following comment on backend/onyx/evals/README.md at line 43:

<comment>Claims a default data file path that does not exist; remove or correct the path/default description.</comment>

<file context>
@@ -0,0 +1,68 @@
+4. Click the play button or press F5
+
+This will run the evaluation with the following default settings:
+- Uses the local data file at `evals/data/data.json`
+- Enables verbose output
+- Sets up proper environment variables and Python path
</file context>
Fix with Cubic


1. Open VS Code in the project root
2. Go to the "Run and Debug" panel (Ctrl/Cmd + Shift + D)
3. Select "Eval CLI" from the dropdown
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VS Code configuration "Eval CLI" does not exist in the repo templates; instruction is misleading.

Prompt for AI agents
Address the following comment on backend/onyx/evals/README.md at line 39:

<comment>VS Code configuration &quot;Eval CLI&quot; does not exist in the repo templates; instruction is misleading.</comment>

<file context>
@@ -0,0 +1,68 @@
+
+1. Open VS Code in the project root
+2. Go to the &quot;Run and Debug&quot; panel (Ctrl/Cmd + Shift + D)
+3. Select &quot;Eval CLI&quot; from the dropdown
+4. Click the play button or press F5
+
</file context>
Fix with Cubic

You can also run the CLI directly from the command line:

```bash
onyx$ python -m dotenv -f .vscode/.env run -- python backend/onyx/evals/eval_cli.py --local-dataset-path backend/onyx/evals/data/eval.json --search-permissions-email richard@onyx.app
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect flag name: use --local-data-path (not --local-dataset-path).

Prompt for AI agents
Address the following comment on backend/onyx/evals/README.md at line 23:

<comment>Incorrect flag name: use --local-data-path (not --local-dataset-path).</comment>

<file context>
@@ -0,0 +1,68 @@
+You can also run the CLI directly from the command line:
+
+```bash
+onyx$ python -m dotenv -f .vscode/.env run -- python backend/onyx/evals/eval_cli.py --local-dataset-path backend/onyx/evals/data/eval.json --search-permissions-email richard@onyx.app
+```
+make sure your AUTH_TYPE=disabled when running evals locally. Save the env var ONYX_EVAL_API_KEY in your .env file so you don&#39;t 
</file context>
Suggested change
onyx$ python -m dotenv -f .vscode/.env run -- python backend/onyx/evals/eval_cli.py --local-dataset-path backend/onyx/evals/data/eval.json --search-permissions-email richard@onyx.app
onyx$ python -m dotenv -f .vscode/.env run -- python backend/onyx/evals/eval_cli.py --local-data-path backend/onyx/evals/data/eval.json --search-permissions-email richard@onyx.app
Fix with Cubic


Kick off a remote job
```bash
onyx/backend$ python onyx/evals/eval_cli.py --remote --api-key <SUPER_CLOUD_USER_API_KEY> --search-permissions-email <email account to reference> --remote --remote-dataset-name Simple
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate --remote flag in example command; remove the redundancy.

Prompt for AI agents
Address the following comment on backend/onyx/evals/README.md at line 17:

<comment>Duplicate --remote flag in example command; remove the redundancy.</comment>

<file context>
@@ -0,0 +1,68 @@
+
+Kick off a remote job
+```bash
+onyx/backend$ python onyx/evals/eval_cli.py --remote --api-key &lt;SUPER_CLOUD_USER_API_KEY&gt; --search-permissions-email &lt;email account to reference&gt; --remote --remote-dataset-name Simple
+```
+
</file context>
Suggested change
onyx/backend$ python onyx/evals/eval_cli.py --remote --api-key <SUPER_CLOUD_USER_API_KEY> --search-permissions-email <email account to reference> --remote --remote-dataset-name Simple
onyx/backend$ python onyx/evals/eval_cli.py --remote --api-key <SUPER_CLOUD_USER_API_KEY> --search-permissions-email <email account to reference> --remote-dataset-name Simple
Fix with Cubic

result = run_remote(
args.base_url,
api_key,
args.remote_dataset_name,
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_remote is invoked with possibly None dataset_name/search_permissions_email, resulting in JSON nulls and server-side failure.

Prompt for AI agents
Address the following comment on backend/onyx/evals/eval_cli.py at line 199:

<comment>run_remote is invoked with possibly None dataset_name/search_permissions_email, resulting in JSON nulls and server-side failure.</comment>

<file context>
@@ -0,0 +1,220 @@
+            result = run_remote(
+                args.base_url,
+                api_key,
+                args.remote_dataset_name,
+                search_permissions_email=args.search_permissions_email,
+            )
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant