JUD-1861: Custom Scorer Overwrite #604

alanzhang25 · 2025-10-15T21:36:04Z

📝 Summary

1. ...

✅ Checklist

Docs updated (if necessary)
Changelogs are updated (if necessary)

linear · 2025-10-15T21:36:06Z

JUD-1861 Add overwrite functionality for custom scorers

greptile-apps

Greptile Overview

Greptile Summary

This PR implements the "Custom Scorer Overwrite" feature (JUD-1861) that allows users to overwrite existing custom scorers during upload operations. The changes span across the API types, CLI interface, and core upload functionality to provide consistent overwrite behavior throughout the judgeval system.

The implementation adds an optional overwrite boolean parameter (defaulting to False) to the CustomScorerUploadPayload data structures in both the auto-generated API types and judgment types files. This parameter is then exposed through the CLI with --overwrite/-o flags and integrated into the main upload_custom_scorer function in __init__.py. Additionally, validation logic was added to ensure scorer files contain exactly one scorer class definition, preventing common upload errors.

The changes maintain backward compatibility by making the overwrite parameter optional with a sensible default, ensuring existing code continues to work without modification while providing new functionality for users who need to replace existing scorers.

PR Description Notes:

The summary section is incomplete with placeholder text "- [ ] 1. ..."
Both checklist items remain unchecked despite the implementation being complete

Important Files Changed

Changed Files

Filename	Score	Overview
src/judgeval/data/judgment_types.py	5/5	Auto-generated file that adds optional `overwrite` field to `CustomScorerUploadPayload` model
src/judgeval/api/api_types.py	5/5	Auto-generated TypedDict that adds `overwrite` as `NotRequired[bool]` to payload structure
src/judgeval/cli.py	4/5	Adds `--overwrite/-o` CLI flag with proper help text and parameter passing
src/judgeval/init.py	2/5	Implements core overwrite functionality but includes flawed validation logic for scorer classes

Confidence score: 3/5

This PR introduces useful functionality but has implementation issues that need attention before merging
Score reduced due to critical flaw in scorer class validation logic that will fail on legitimate Python inheritance patterns and miss properly formatted classes
Pay close attention to src/judgeval/init.py - the validation logic needs to be rewritten to properly parse Python class definitions

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as "CLI App"
    participant JudgmentClient
    participant FileSystem as "File System"
    participant JudgmentSyncClient as "API Client"
    participant Backend as "Judgment API"
    
    User->>CLI: "judgeval upload_scorer --scorer-file path/to/scorer.py --requirements path/to/requirements.txt --overwrite"
    CLI->>FileSystem: "Check if scorer file exists"
    FileSystem-->>CLI: "File exists confirmation"
    CLI->>FileSystem: "Check if requirements file exists"
    FileSystem-->>CLI: "File exists confirmation"
    CLI->>JudgmentClient: "create client instance"
    CLI->>JudgmentClient: "upload_custom_scorer(scorer_file_path, requirements_file_path, unique_name, overwrite)"
    JudgmentClient->>FileSystem: "Check scorer file exists"
    FileSystem-->>JudgmentClient: "File exists confirmation"
    JudgmentClient->>FileSystem: "Read scorer file content"
    FileSystem-->>JudgmentClient: "Scorer code content"
    JudgmentClient->>JudgmentClient: "Validate scorer class count"
    JudgmentClient->>FileSystem: "Read requirements file (if exists)"
    FileSystem-->>JudgmentClient: "Requirements content"
    JudgmentClient->>JudgmentSyncClient: "create API client"
    JudgmentClient->>JudgmentSyncClient: "upload_custom_scorer(payload)"
    JudgmentSyncClient->>Backend: "POST /upload_custom_scorer"
    Backend-->>JudgmentSyncClient: "Upload response"
    JudgmentSyncClient-->>JudgmentClient: "Response with status"
    JudgmentClient-->>CLI: "Upload success/failure result"
    CLI-->>User: "Success/failure message"

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

src/judgeval/__init__.py

propel-code-bot · 2025-10-16T06:38:44Z

src/judgeval/__init__.py

+        # Check for multiple scorer class definitions
+        scorer_class_count = 0
+        for line in scorer_code.split("\n"):
+            if "class" in line and "ExampleScorer" in line:
+                scorer_class_count += 1
+
+        if scorer_class_count > 1:
+            error_msg = f"Multiple Scorer classes found in {scorer_file_path}. Please only upload one scorer class per file."
+            judgeval_logger.error(error_msg)
+            raise ValueError(error_msg)
+        elif scorer_class_count == 0:
+            error_msg = f"No Scorer class was found in {scorer_file_path}. Please ensure the file contains a valid scorer class."
+            judgeval_logger.error(error_msg)
+            raise ValueError(error_msg)


[BestPractice]

The class detection logic is too simplistic and brittle. It will produce false positives for:

Comments: # This class inherits from ExampleScorer

Strings: "The class ExampleScorer is useful"

Imports: from package import CustomExampleScorer as ExampleScorer

Unrelated classes: class MyExampleScorerHelper:

Suggested Change

Suggested change

# Check for multiple scorer class definitions

scorer_class_count = 0

for line in scorer_code.split("\n"):

if "class" in line and "ExampleScorer" in line:

scorer_class_count += 1

if scorer_class_count > 1:

error_msg = f"Multiple Scorer classes found in {scorer_file_path}. Please only upload one scorer class per file."

judgeval_logger.error(error_msg)

raise ValueError(error_msg)

elif scorer_class_count == 0:

error_msg = f"No Scorer class was found in {scorer_file_path}. Please ensure the file contains a valid scorer class."

judgeval_logger.error(error_msg)

raise ValueError(error_msg)

# Check for multiple scorer class definitions

import ast

import inspect

try:

tree = ast.parse(scorer_code)

scorer_classes = []

for node in ast.walk(tree):

if isinstance(node, ast.ClassDef):

# Check if class inherits from or contains "Scorer" in bases

for base in node.bases:

if isinstance(base, ast.Name) and "Scorer" in base.id:

scorer_classes.append(node.name)

elif isinstance(base, ast.Attribute) and "Scorer" in base.attr:

scorer_classes.append(node.name)

if len(scorer_classes) > 1:

error_msg = f"Multiple Scorer classes found in {scorer_file_path}: {', '.join(scorer_classes)}. Please only upload one scorer class per file."

judgeval_logger.error(error_msg)

raise ValueError(error_msg)

elif len(scorer_classes) == 0:

error_msg = f"No Scorer class was found in {scorer_file_path}. Please ensure the file contains a valid scorer class."

judgeval_logger.error(error_msg)

raise ValueError(error_msg)

except SyntaxError as e:

error_msg = f"Invalid Python syntax in {scorer_file_path}: {e}"

judgeval_logger.error(error_msg)

raise ValueError(error_msg)

⚡ Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents

[**BestPractice**] The class detection logic is too simplistic and brittle. It will produce false positives for: 1. **Comments**: `# This class inherits from ExampleScorer` 2. **Strings**: `"The class ExampleScorer is useful"` 3. **Imports**: `from package import CustomExampleScorer as ExampleScorer` 4. **Unrelated classes**: `class MyExampleScorerHelper:` <details> <summary>Suggested Change</summary> ```suggestion # Check for multiple scorer class definitions import ast import inspect try: tree = ast.parse(scorer_code) scorer_classes = [] for node in ast.walk(tree): if isinstance(node, ast.ClassDef): # Check if class inherits from or contains "Scorer" in bases for base in node.bases: if isinstance(base, ast.Name) and "Scorer" in base.id: scorer_classes.append(node.name) elif isinstance(base, ast.Attribute) and "Scorer" in base.attr: scorer_classes.append(node.name) if len(scorer_classes) > 1: error_msg = f"Multiple Scorer classes found in {scorer_file_path}: {', '.join(scorer_classes)}. Please only upload one scorer class per file." judgeval_logger.error(error_msg) raise ValueError(error_msg) elif len(scorer_classes) == 0: error_msg = f"No Scorer class was found in {scorer_file_path}. Please ensure the file contains a valid scorer class." judgeval_logger.error(error_msg) raise ValueError(error_msg) except SyntaxError as e: error_msg = f"Invalid Python syntax in {scorer_file_path}: {e}" judgeval_logger.error(error_msg) raise ValueError(error_msg) ``` ⚡ **Committable suggestion** Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. </details> File: src/judgeval/__init__.py Line: 129

actually do this first

Mandolaro

lgtm

Mandolaro

actually do the ast parsing

Mandolaro

Lgtm

alanzhang25 force-pushed the az-custom-scorer-updates branch from f4c825c to 89c5336 Compare October 15, 2025 22:03

alanzhang25 marked this pull request as ready for review October 16, 2025 06:36

greptile-apps bot reviewed Oct 16, 2025

View reviewed changes

src/judgeval/__init__.py Outdated Show resolved Hide resolved

propel-code-bot bot reviewed Oct 16, 2025

View reviewed changes

Mandolaro approved these changes Oct 16, 2025

View reviewed changes

Mandolaro requested changes Oct 16, 2025

View reviewed changes

alanzhang25 added 3 commits October 16, 2025 10:02

Custom Scorer Overwrite

1994bbf

extra check

aa057e9

update logic

b67c704

alanzhang25 force-pushed the az-custom-scorer-updates branch from 13eeb08 to b67c704 Compare October 16, 2025 17:07

Mandolaro approved these changes Oct 16, 2025

View reviewed changes

alanzhang25 merged commit ba30db8 into staging Oct 16, 2025
36 of 44 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

JUD-1861: Custom Scorer Overwrite #604

JUD-1861: Custom Scorer Overwrite #604

Uh oh!

alanzhang25 commented Oct 15, 2025

Uh oh!

linear bot commented Oct 15, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

propel-code-bot bot Oct 16, 2025

Uh oh!

Mandolaro Oct 16, 2025

Uh oh!

Mandolaro left a comment

Uh oh!

Mandolaro left a comment

Uh oh!

Mandolaro left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

JUD-1861: Custom Scorer Overwrite #604

JUD-1861: Custom Scorer Overwrite #604

Uh oh!

Conversation

alanzhang25 commented Oct 15, 2025

📝 Summary

✅ Checklist

Uh oh!

linear bot commented Oct 15, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Important Files Changed

Confidence score: 3/5

Sequence Diagram

Uh oh!

Uh oh!

propel-code-bot bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Mandolaro Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Mandolaro left a comment

Choose a reason for hiding this comment

Uh oh!

Mandolaro left a comment

Choose a reason for hiding this comment

Uh oh!

Mandolaro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants