-
Notifications
You must be signed in to change notification settings - Fork 87
Fix Sharing SpanStore #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @Mandolaro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue in the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with π and π on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR fixes a critical concurrency issue in the tracing system by introducing trace-specific span management. The changes refactor the shared span store architecture to prevent data corruption when multiple training operations run concurrently.
The SpanStore
class is redesigned to organize spans by trace ID using a dictionary structure (_spans_by_trace
) instead of a flat list. This includes new methods get_by_trace_id()
and clear_trace()
for trace-specific operations, enabling O(1) access to spans for a given trace rather than O(n) linear searches.
The FireworksTrainer
is updated to use these trace-specific operations. The _extract_message_history_from_spans
method now accepts a trace_id
parameter and filters spans accordingly. During training execution, the code extracts the current trace ID from the span context and uses it for both message history extraction and cleanup, replacing the previous global span clearing that could affect concurrent operations.
These changes ensure proper isolation between concurrent training operations, where each trace only accesses its own spans, making the system safe for multi-threaded or multi-process environments.
Important Files Changed
Changed Files
Filename | Score | Overview |
---|---|---|
src/judgeval/tracer/exporters/store.py | 4/5 | Refactored SpanStore to use trace-grouped dictionary structure with new trace-specific methods |
src/judgeval/trainer/fireworks_trainer.py | 4/5 | Updated training logic to use trace-specific span operations and proper trace ID extraction |
Confidence score: 4/5
- This PR addresses a critical concurrency bug and is generally safe to merge with good architectural improvements
- Score reflects well-structured changes that follow good software design patterns, but deducted one point due to potential edge cases around span context availability and error handling
- Pay close attention to the trace ID extraction logic in fireworks_trainer.py and ensure proper testing of concurrent scenarios
Sequence Diagram
sequenceDiagram
participant User
participant FireworksTrainer
participant SpanStore
participant Tracer
participant AgentFunction
participant JudgmentClient
User->>FireworksTrainer: "train(agent_function, scorers, prompts)"
FireworksTrainer->>FireworksTrainer: "run_reinforcement_learning()"
loop for each training step
FireworksTrainer->>FireworksTrainer: "generate_rollouts_and_rewards()"
loop for each prompt and generation
FireworksTrainer->>Tracer: "@tracer.observe(span_type='function')"
FireworksTrainer->>AgentFunction: "await agent_function(**prompt_input)"
AgentFunction-->>FireworksTrainer: "response_data"
FireworksTrainer->>Tracer: "get_current_span()"
Tracer-->>FireworksTrainer: "current_span"
FireworksTrainer->>FireworksTrainer: "extract trace_id from span context"
FireworksTrainer->>SpanStore: "get_by_trace_id(trace_id)"
SpanStore-->>FireworksTrainer: "spans for trace"
FireworksTrainer->>FireworksTrainer: "_extract_message_history_from_spans()"
FireworksTrainer->>JudgmentClient: "run_evaluation(examples, scorers)"
JudgmentClient-->>FireworksTrainer: "scoring_results"
FireworksTrainer->>SpanStore: "clear_trace(trace_id)"
SpanStore->>SpanStore: "delete spans for trace_id"
end
FireworksTrainer->>FireworksTrainer: "perform_reinforcement_step(dataset)"
end
FireworksTrainer-->>User: "ModelConfig"
2 files reviewed, 3 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the SpanStore
to be trace-aware by partitioning spans by trace_id
. This is an excellent approach to address potential concurrency issues arising from a shared SpanStore
in an asynchronous environment. The changes in fireworks_trainer.py
correctly utilize this new functionality to isolate spans for each generation, which should resolve the underlying issue of mixed-up traces. The implementation is solid, and I've added a few minor suggestions to improve code conciseness and adhere to Pythonic idioms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR fixes a critical concurrency issue in the FireworksTrainer where multiple training operations were sharing a single SpanStore instance, causing data leakage and race conditions. The solution refactors the SpanStore to organize spans by trace ID rather than in a flat list, enabling proper isolation between concurrent operations.
The key changes include: (1) Refactoring SpanStore
to use trace-based storage with new methods get_by_trace_id()
and clear_trace()
, (2) Updating FireworksTrainer._extract_message_history_from_spans()
to accept a trace_id parameter and filter spans accordingly, and (3) Replacing global span clearing with trace-specific cleanup to prevent interference between concurrent operations. This ensures each training operation only processes its own spans while maintaining backward compatibility for existing SpanStore methods.
Important Files Changed
Changed Files
Filename | Score | Overview |
---|---|---|
src/judgeval/tracer/exporters/store.py | 4/5 | Refactored SpanStore to organize spans by trace ID with new methods for trace-based operations |
src/judgeval/trainer/fireworks_trainer.py | 4/5 | Updated message history extraction to use trace-specific span filtering and cleanup |
Confidence score: 4/5
- This PR addresses a real concurrency issue but still has some style and type annotation concerns from previous reviews
- Score reflects solid implementation of trace isolation but deducted one point due to unaddressed style issues (missing return type annotations, potential silent failures, and trace_id typing)
- Pay close attention to the SpanStore refactor as it changes core data structures used across the tracing system
Sequence Diagram
sequenceDiagram
participant User
participant FireworksTrainer
participant TrainableModel
participant Tracer
participant SpanStore
participant AgentFunction
participant JudgmentClient
participant Dataset
User->>FireworksTrainer: "train(agent_function, scorers, prompts)"
FireworksTrainer->>FireworksTrainer: "run_reinforcement_learning()"
loop for each training step
FireworksTrainer->>TrainableModel: "advance_to_next_step(step)"
FireworksTrainer->>FireworksTrainer: "generate_rollouts_and_rewards()"
loop for each prompt and generation
FireworksTrainer->>AgentFunction: "agent_function(**prompt_input)"
AgentFunction-->>FireworksTrainer: "response_data"
FireworksTrainer->>Tracer: "get_current_span()"
Tracer-->>FireworksTrainer: "current_span"
FireworksTrainer->>SpanStore: "get_by_trace_id(trace_id)"
SpanStore-->>FireworksTrainer: "spans"
FireworksTrainer->>FireworksTrainer: "_extract_message_history_from_spans()"
FireworksTrainer->>JudgmentClient: "run_evaluation(examples, scorers)"
JudgmentClient-->>FireworksTrainer: "scoring_results"
FireworksTrainer->>SpanStore: "clear_trace(trace_id)"
end
FireworksTrainer->>Dataset: "Dataset.from_list(dataset_rows)"
Dataset-->>FireworksTrainer: "dataset"
FireworksTrainer->>Dataset: "sync()"
FireworksTrainer->>TrainableModel: "perform_reinforcement_step(dataset, step)"
TrainableModel-->>FireworksTrainer: "job"
loop while job not completed
FireworksTrainer->>TrainableModel: "job.get()"
TrainableModel-->>FireworksTrainer: "updated_job"
end
FireworksTrainer->>Dataset: "delete()"
end
FireworksTrainer->>TrainableModel: "advance_to_next_step(num_steps)"
FireworksTrainer->>TrainableModel: "get_model_config(training_params)"
TrainableModel-->>FireworksTrainer: "model_config"
FireworksTrainer-->>User: "model_config"
2 files reviewed, no comments
π Summary
β Checklist