Skip to content

Conversation

@alanzhang25
Copy link
Collaborator

πŸ“ Summary

  • 1. ...

βœ… Checklist

@linear
Copy link

linear bot commented Oct 15, 2025

@alanzhang25 alanzhang25 force-pushed the az-custom-scorer-updates branch from f4c825c to 89c5336 Compare October 15, 2025 22:03
@alanzhang25 alanzhang25 marked this pull request as ready for review October 16, 2025 06:36
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR implements the "Custom Scorer Overwrite" feature (JUD-1861) that allows users to overwrite existing custom scorers during upload operations. The changes span across the API types, CLI interface, and core upload functionality to provide consistent overwrite behavior throughout the judgeval system.

The implementation adds an optional overwrite boolean parameter (defaulting to False) to the CustomScorerUploadPayload data structures in both the auto-generated API types and judgment types files. This parameter is then exposed through the CLI with --overwrite/-o flags and integrated into the main upload_custom_scorer function in __init__.py. Additionally, validation logic was added to ensure scorer files contain exactly one scorer class definition, preventing common upload errors.

The changes maintain backward compatibility by making the overwrite parameter optional with a sensible default, ensuring existing code continues to work without modification while providing new functionality for users who need to replace existing scorers.

PR Description Notes:

  • The summary section is incomplete with placeholder text "- [ ] 1. ..."
  • Both checklist items remain unchecked despite the implementation being complete

Important Files Changed

Changed Files
Filename Score Overview
src/judgeval/data/judgment_types.py 5/5 Auto-generated file that adds optional overwrite field to CustomScorerUploadPayload model
src/judgeval/api/api_types.py 5/5 Auto-generated TypedDict that adds overwrite as NotRequired[bool] to payload structure
src/judgeval/cli.py 4/5 Adds --overwrite/-o CLI flag with proper help text and parameter passing
src/judgeval/init.py 2/5 Implements core overwrite functionality but includes flawed validation logic for scorer classes

Confidence score: 3/5

  • This PR introduces useful functionality but has implementation issues that need attention before merging
  • Score reduced due to critical flaw in scorer class validation logic that will fail on legitimate Python inheritance patterns and miss properly formatted classes
  • Pay close attention to src/judgeval/init.py - the validation logic needs to be rewritten to properly parse Python class definitions

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as "CLI App"
    participant JudgmentClient
    participant FileSystem as "File System"
    participant JudgmentSyncClient as "API Client"
    participant Backend as "Judgment API"
    
    User->>CLI: "judgeval upload_scorer --scorer-file path/to/scorer.py --requirements path/to/requirements.txt --overwrite"
    CLI->>FileSystem: "Check if scorer file exists"
    FileSystem-->>CLI: "File exists confirmation"
    CLI->>FileSystem: "Check if requirements file exists"
    FileSystem-->>CLI: "File exists confirmation"
    CLI->>JudgmentClient: "create client instance"
    CLI->>JudgmentClient: "upload_custom_scorer(scorer_file_path, requirements_file_path, unique_name, overwrite)"
    JudgmentClient->>FileSystem: "Check scorer file exists"
    FileSystem-->>JudgmentClient: "File exists confirmation"
    JudgmentClient->>FileSystem: "Read scorer file content"
    FileSystem-->>JudgmentClient: "Scorer code content"
    JudgmentClient->>JudgmentClient: "Validate scorer class count"
    JudgmentClient->>FileSystem: "Read requirements file (if exists)"
    FileSystem-->>JudgmentClient: "Requirements content"
    JudgmentClient->>JudgmentSyncClient: "create API client"
    JudgmentClient->>JudgmentSyncClient: "upload_custom_scorer(payload)"
    JudgmentSyncClient->>Backend: "POST /upload_custom_scorer"
    Backend-->>JudgmentSyncClient: "Upload response"
    JudgmentSyncClient-->>JudgmentClient: "Response with status"
    JudgmentClient-->>CLI: "Upload success/failure result"
    CLI-->>User: "Success/failure message"
Loading

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 116 to 129
# Check for multiple scorer class definitions
scorer_class_count = 0
for line in scorer_code.split("\n"):
if "class" in line and "ExampleScorer" in line:
scorer_class_count += 1

if scorer_class_count > 1:
error_msg = f"Multiple Scorer classes found in {scorer_file_path}. Please only upload one scorer class per file."
judgeval_logger.error(error_msg)
raise ValueError(error_msg)
elif scorer_class_count == 0:
error_msg = f"No Scorer class was found in {scorer_file_path}. Please ensure the file contains a valid scorer class."
judgeval_logger.error(error_msg)
raise ValueError(error_msg)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

The class detection logic is too simplistic and brittle. It will produce false positives for:

  1. Comments: # This class inherits from ExampleScorer
  2. Strings: "The class ExampleScorer is useful"
  3. Imports: from package import CustomExampleScorer as ExampleScorer
  4. Unrelated classes: class MyExampleScorerHelper:
Suggested Change
Suggested change
# Check for multiple scorer class definitions
scorer_class_count = 0
for line in scorer_code.split("\n"):
if "class" in line and "ExampleScorer" in line:
scorer_class_count += 1
if scorer_class_count > 1:
error_msg = f"Multiple Scorer classes found in {scorer_file_path}. Please only upload one scorer class per file."
judgeval_logger.error(error_msg)
raise ValueError(error_msg)
elif scorer_class_count == 0:
error_msg = f"No Scorer class was found in {scorer_file_path}. Please ensure the file contains a valid scorer class."
judgeval_logger.error(error_msg)
raise ValueError(error_msg)
# Check for multiple scorer class definitions
import ast
import inspect
try:
tree = ast.parse(scorer_code)
scorer_classes = []
for node in ast.walk(tree):
if isinstance(node, ast.ClassDef):
# Check if class inherits from or contains "Scorer" in bases
for base in node.bases:
if isinstance(base, ast.Name) and "Scorer" in base.id:
scorer_classes.append(node.name)
elif isinstance(base, ast.Attribute) and "Scorer" in base.attr:
scorer_classes.append(node.name)
if len(scorer_classes) > 1:
error_msg = f"Multiple Scorer classes found in {scorer_file_path}: {', '.join(scorer_classes)}. Please only upload one scorer class per file."
judgeval_logger.error(error_msg)
raise ValueError(error_msg)
elif len(scorer_classes) == 0:
error_msg = f"No Scorer class was found in {scorer_file_path}. Please ensure the file contains a valid scorer class."
judgeval_logger.error(error_msg)
raise ValueError(error_msg)
except SyntaxError as e:
error_msg = f"Invalid Python syntax in {scorer_file_path}: {e}"
judgeval_logger.error(error_msg)
raise ValueError(error_msg)

⚑ Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

The class detection logic is too simplistic and brittle. It will produce false positives for:

1. **Comments**: `# This class inherits from ExampleScorer`
2. **Strings**: `"The class ExampleScorer is useful"`
3. **Imports**: `from package import CustomExampleScorer as ExampleScorer`
4. **Unrelated classes**: `class MyExampleScorerHelper:`

<details>
<summary>Suggested Change</summary>

```suggestion
# Check for multiple scorer class definitions
import ast
import inspect

try:
    tree = ast.parse(scorer_code)
    scorer_classes = []
    
    for node in ast.walk(tree):
        if isinstance(node, ast.ClassDef):
            # Check if class inherits from or contains "Scorer" in bases
            for base in node.bases:
                if isinstance(base, ast.Name) and "Scorer" in base.id:
                    scorer_classes.append(node.name)
                elif isinstance(base, ast.Attribute) and "Scorer" in base.attr:
                    scorer_classes.append(node.name)
    
    if len(scorer_classes) > 1:
        error_msg = f"Multiple Scorer classes found in {scorer_file_path}: {', '.join(scorer_classes)}. Please only upload one scorer class per file."
        judgeval_logger.error(error_msg)
        raise ValueError(error_msg)
    elif len(scorer_classes) == 0:
        error_msg = f"No Scorer class was found in {scorer_file_path}. Please ensure the file contains a valid scorer class."
        judgeval_logger.error(error_msg)
        raise ValueError(error_msg)
except SyntaxError as e:
    error_msg = f"Invalid Python syntax in {scorer_file_path}: {e}"
    judgeval_logger.error(error_msg)
    raise ValueError(error_msg)
```

⚑ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

</details>

File: src/judgeval/__init__.py
Line: 129

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually do this first

Copy link
Collaborator

@Mandolaro Mandolaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Collaborator

@Mandolaro Mandolaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually do the ast parsing

@alanzhang25 alanzhang25 force-pushed the az-custom-scorer-updates branch from 13eeb08 to b67c704 Compare October 16, 2025 17:07
Copy link
Collaborator

@Mandolaro Mandolaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@alanzhang25 alanzhang25 merged commit ba30db8 into staging Oct 16, 2025
36 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants