-
Couldn't load subscription status.
- Fork 87
JUD-1861: Custom Scorer Overwrite #604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f4c825c to
89c5336
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR implements the "Custom Scorer Overwrite" feature (JUD-1861) that allows users to overwrite existing custom scorers during upload operations. The changes span across the API types, CLI interface, and core upload functionality to provide consistent overwrite behavior throughout the judgeval system.
The implementation adds an optional overwrite boolean parameter (defaulting to False) to the CustomScorerUploadPayload data structures in both the auto-generated API types and judgment types files. This parameter is then exposed through the CLI with --overwrite/-o flags and integrated into the main upload_custom_scorer function in __init__.py. Additionally, validation logic was added to ensure scorer files contain exactly one scorer class definition, preventing common upload errors.
The changes maintain backward compatibility by making the overwrite parameter optional with a sensible default, ensuring existing code continues to work without modification while providing new functionality for users who need to replace existing scorers.
PR Description Notes:
- The summary section is incomplete with placeholder text "- [ ] 1. ..."
- Both checklist items remain unchecked despite the implementation being complete
Important Files Changed
Changed Files
| Filename | Score | Overview |
|---|---|---|
| src/judgeval/data/judgment_types.py | 5/5 | Auto-generated file that adds optional overwrite field to CustomScorerUploadPayload model |
| src/judgeval/api/api_types.py | 5/5 | Auto-generated TypedDict that adds overwrite as NotRequired[bool] to payload structure |
| src/judgeval/cli.py | 4/5 | Adds --overwrite/-o CLI flag with proper help text and parameter passing |
| src/judgeval/init.py | 2/5 | Implements core overwrite functionality but includes flawed validation logic for scorer classes |
Confidence score: 3/5
- This PR introduces useful functionality but has implementation issues that need attention before merging
- Score reduced due to critical flaw in scorer class validation logic that will fail on legitimate Python inheritance patterns and miss properly formatted classes
- Pay close attention to src/judgeval/init.py - the validation logic needs to be rewritten to properly parse Python class definitions
Sequence Diagram
sequenceDiagram
participant User
participant CLI as "CLI App"
participant JudgmentClient
participant FileSystem as "File System"
participant JudgmentSyncClient as "API Client"
participant Backend as "Judgment API"
User->>CLI: "judgeval upload_scorer --scorer-file path/to/scorer.py --requirements path/to/requirements.txt --overwrite"
CLI->>FileSystem: "Check if scorer file exists"
FileSystem-->>CLI: "File exists confirmation"
CLI->>FileSystem: "Check if requirements file exists"
FileSystem-->>CLI: "File exists confirmation"
CLI->>JudgmentClient: "create client instance"
CLI->>JudgmentClient: "upload_custom_scorer(scorer_file_path, requirements_file_path, unique_name, overwrite)"
JudgmentClient->>FileSystem: "Check scorer file exists"
FileSystem-->>JudgmentClient: "File exists confirmation"
JudgmentClient->>FileSystem: "Read scorer file content"
FileSystem-->>JudgmentClient: "Scorer code content"
JudgmentClient->>JudgmentClient: "Validate scorer class count"
JudgmentClient->>FileSystem: "Read requirements file (if exists)"
FileSystem-->>JudgmentClient: "Requirements content"
JudgmentClient->>JudgmentSyncClient: "create API client"
JudgmentClient->>JudgmentSyncClient: "upload_custom_scorer(payload)"
JudgmentSyncClient->>Backend: "POST /upload_custom_scorer"
Backend-->>JudgmentSyncClient: "Upload response"
JudgmentSyncClient-->>JudgmentClient: "Response with status"
JudgmentClient-->>CLI: "Upload success/failure result"
CLI-->>User: "Success/failure message"
4 files reviewed, 1 comment
src/judgeval/__init__.py
Outdated
| # Check for multiple scorer class definitions | ||
| scorer_class_count = 0 | ||
| for line in scorer_code.split("\n"): | ||
| if "class" in line and "ExampleScorer" in line: | ||
| scorer_class_count += 1 | ||
|
|
||
| if scorer_class_count > 1: | ||
| error_msg = f"Multiple Scorer classes found in {scorer_file_path}. Please only upload one scorer class per file." | ||
| judgeval_logger.error(error_msg) | ||
| raise ValueError(error_msg) | ||
| elif scorer_class_count == 0: | ||
| error_msg = f"No Scorer class was found in {scorer_file_path}. Please ensure the file contains a valid scorer class." | ||
| judgeval_logger.error(error_msg) | ||
| raise ValueError(error_msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
The class detection logic is too simplistic and brittle. It will produce false positives for:
- Comments:
# This class inherits from ExampleScorer - Strings:
"The class ExampleScorer is useful" - Imports:
from package import CustomExampleScorer as ExampleScorer - Unrelated classes:
class MyExampleScorerHelper:
Suggested Change
| # Check for multiple scorer class definitions | |
| scorer_class_count = 0 | |
| for line in scorer_code.split("\n"): | |
| if "class" in line and "ExampleScorer" in line: | |
| scorer_class_count += 1 | |
| if scorer_class_count > 1: | |
| error_msg = f"Multiple Scorer classes found in {scorer_file_path}. Please only upload one scorer class per file." | |
| judgeval_logger.error(error_msg) | |
| raise ValueError(error_msg) | |
| elif scorer_class_count == 0: | |
| error_msg = f"No Scorer class was found in {scorer_file_path}. Please ensure the file contains a valid scorer class." | |
| judgeval_logger.error(error_msg) | |
| raise ValueError(error_msg) | |
| # Check for multiple scorer class definitions | |
| import ast | |
| import inspect | |
| try: | |
| tree = ast.parse(scorer_code) | |
| scorer_classes = [] | |
| for node in ast.walk(tree): | |
| if isinstance(node, ast.ClassDef): | |
| # Check if class inherits from or contains "Scorer" in bases | |
| for base in node.bases: | |
| if isinstance(base, ast.Name) and "Scorer" in base.id: | |
| scorer_classes.append(node.name) | |
| elif isinstance(base, ast.Attribute) and "Scorer" in base.attr: | |
| scorer_classes.append(node.name) | |
| if len(scorer_classes) > 1: | |
| error_msg = f"Multiple Scorer classes found in {scorer_file_path}: {', '.join(scorer_classes)}. Please only upload one scorer class per file." | |
| judgeval_logger.error(error_msg) | |
| raise ValueError(error_msg) | |
| elif len(scorer_classes) == 0: | |
| error_msg = f"No Scorer class was found in {scorer_file_path}. Please ensure the file contains a valid scorer class." | |
| judgeval_logger.error(error_msg) | |
| raise ValueError(error_msg) | |
| except SyntaxError as e: | |
| error_msg = f"Invalid Python syntax in {scorer_file_path}: {e}" | |
| judgeval_logger.error(error_msg) | |
| raise ValueError(error_msg) |
β‘ Committable suggestion
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Context for Agents
[**BestPractice**]
The class detection logic is too simplistic and brittle. It will produce false positives for:
1. **Comments**: `# This class inherits from ExampleScorer`
2. **Strings**: `"The class ExampleScorer is useful"`
3. **Imports**: `from package import CustomExampleScorer as ExampleScorer`
4. **Unrelated classes**: `class MyExampleScorerHelper:`
<details>
<summary>Suggested Change</summary>
```suggestion
# Check for multiple scorer class definitions
import ast
import inspect
try:
tree = ast.parse(scorer_code)
scorer_classes = []
for node in ast.walk(tree):
if isinstance(node, ast.ClassDef):
# Check if class inherits from or contains "Scorer" in bases
for base in node.bases:
if isinstance(base, ast.Name) and "Scorer" in base.id:
scorer_classes.append(node.name)
elif isinstance(base, ast.Attribute) and "Scorer" in base.attr:
scorer_classes.append(node.name)
if len(scorer_classes) > 1:
error_msg = f"Multiple Scorer classes found in {scorer_file_path}: {', '.join(scorer_classes)}. Please only upload one scorer class per file."
judgeval_logger.error(error_msg)
raise ValueError(error_msg)
elif len(scorer_classes) == 0:
error_msg = f"No Scorer class was found in {scorer_file_path}. Please ensure the file contains a valid scorer class."
judgeval_logger.error(error_msg)
raise ValueError(error_msg)
except SyntaxError as e:
error_msg = f"Invalid Python syntax in {scorer_file_path}: {e}"
judgeval_logger.error(error_msg)
raise ValueError(error_msg)
```
β‘ **Committable suggestion**
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
</details>
File: src/judgeval/__init__.py
Line: 129There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually do this first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually do the ast parsing
13eeb08 to
b67c704
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
π Summary
β Checklist