Skip to content

Conversation

Mandolaro
Copy link
Collaborator

πŸ“ Summary

  • 1. Fix Span Store

βœ… Checklist

Copy link
Contributor

Summary of Changes

Hello @Mandolaro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue in the SpanStore by refactoring its internal storage mechanism to organize spans by their trace ID. This change ensures that tracing data is properly isolated and managed for individual traces, preventing potential data mixing or incorrect clearing, particularly in contexts like the FireworksTrainer where specific trace histories are crucial for training. The update enhances the accuracy and reliability of span retrieval and cleanup operations.

Highlights

  • SpanStore Refactoring: The SpanStore class has been refactored to store and manage spans on a per-trace-ID basis, moving from a flat list of all spans to a dictionary mapping trace IDs to lists of their respective spans. This improves isolation and management of tracing data.
  • New SpanStore Methods: New abstract methods get_by_trace_id and clear_trace have been added to ABCSpanStore and implemented in SpanStore. These methods allow for retrieving and clearing spans associated with a specific trace ID.
  • FireworksTrainer Integration: The FireworksTrainer has been updated to leverage the new trace-ID-specific span management. It now extracts message history and clears spans based on the current trace ID, ensuring that training data is correctly isolated per generation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with πŸ‘ and πŸ‘Ž on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

@Mandolaro Mandolaro requested a review from alanzhang25 October 16, 2025 06:27
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR fixes a critical concurrency issue in the tracing system by introducing trace-specific span management. The changes refactor the shared span store architecture to prevent data corruption when multiple training operations run concurrently.

The SpanStore class is redesigned to organize spans by trace ID using a dictionary structure (_spans_by_trace) instead of a flat list. This includes new methods get_by_trace_id() and clear_trace() for trace-specific operations, enabling O(1) access to spans for a given trace rather than O(n) linear searches.

The FireworksTrainer is updated to use these trace-specific operations. The _extract_message_history_from_spans method now accepts a trace_id parameter and filters spans accordingly. During training execution, the code extracts the current trace ID from the span context and uses it for both message history extraction and cleanup, replacing the previous global span clearing that could affect concurrent operations.

These changes ensure proper isolation between concurrent training operations, where each trace only accesses its own spans, making the system safe for multi-threaded or multi-process environments.

Important Files Changed

Changed Files
Filename Score Overview
src/judgeval/tracer/exporters/store.py 4/5 Refactored SpanStore to use trace-grouped dictionary structure with new trace-specific methods
src/judgeval/trainer/fireworks_trainer.py 4/5 Updated training logic to use trace-specific span operations and proper trace ID extraction

Confidence score: 4/5

  • This PR addresses a critical concurrency bug and is generally safe to merge with good architectural improvements
  • Score reflects well-structured changes that follow good software design patterns, but deducted one point due to potential edge cases around span context availability and error handling
  • Pay close attention to the trace ID extraction logic in fireworks_trainer.py and ensure proper testing of concurrent scenarios

Sequence Diagram

sequenceDiagram
    participant User
    participant FireworksTrainer
    participant SpanStore
    participant Tracer
    participant AgentFunction
    participant JudgmentClient
    
    User->>FireworksTrainer: "train(agent_function, scorers, prompts)"
    FireworksTrainer->>FireworksTrainer: "run_reinforcement_learning()"
    
    loop for each training step
        FireworksTrainer->>FireworksTrainer: "generate_rollouts_and_rewards()"
        
        loop for each prompt and generation
            FireworksTrainer->>Tracer: "@tracer.observe(span_type='function')"
            FireworksTrainer->>AgentFunction: "await agent_function(**prompt_input)"
            AgentFunction-->>FireworksTrainer: "response_data"
            
            FireworksTrainer->>Tracer: "get_current_span()"
            Tracer-->>FireworksTrainer: "current_span"
            FireworksTrainer->>FireworksTrainer: "extract trace_id from span context"
            
            FireworksTrainer->>SpanStore: "get_by_trace_id(trace_id)"
            SpanStore-->>FireworksTrainer: "spans for trace"
            FireworksTrainer->>FireworksTrainer: "_extract_message_history_from_spans()"
            
            FireworksTrainer->>JudgmentClient: "run_evaluation(examples, scorers)"
            JudgmentClient-->>FireworksTrainer: "scoring_results"
            
            FireworksTrainer->>SpanStore: "clear_trace(trace_id)"
            SpanStore->>SpanStore: "delete spans for trace_id"
        end
        
        FireworksTrainer->>FireworksTrainer: "perform_reinforcement_step(dataset)"
    end
    
    FireworksTrainer-->>User: "ModelConfig"
Loading

2 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@Mandolaro Mandolaro changed the base branch from main to staging October 16, 2025 06:28
@Mandolaro Mandolaro closed this Oct 16, 2025
@Mandolaro Mandolaro reopened this Oct 16, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the SpanStore to be trace-aware by partitioning spans by trace_id. This is an excellent approach to address potential concurrency issues arising from a shared SpanStore in an asynchronous environment. The changes in fireworks_trainer.py correctly utilize this new functionality to isolate spans for each generation, which should resolve the underlying issue of mixed-up traces. The implementation is solid, and I've added a few minor suggestions to improve code conciseness and adhere to Pythonic idioms.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR fixes a critical concurrency issue in the FireworksTrainer where multiple training operations were sharing a single SpanStore instance, causing data leakage and race conditions. The solution refactors the SpanStore to organize spans by trace ID rather than in a flat list, enabling proper isolation between concurrent operations.

The key changes include: (1) Refactoring SpanStore to use trace-based storage with new methods get_by_trace_id() and clear_trace(), (2) Updating FireworksTrainer._extract_message_history_from_spans() to accept a trace_id parameter and filter spans accordingly, and (3) Replacing global span clearing with trace-specific cleanup to prevent interference between concurrent operations. This ensures each training operation only processes its own spans while maintaining backward compatibility for existing SpanStore methods.

Important Files Changed

Changed Files
Filename Score Overview
src/judgeval/tracer/exporters/store.py 4/5 Refactored SpanStore to organize spans by trace ID with new methods for trace-based operations
src/judgeval/trainer/fireworks_trainer.py 4/5 Updated message history extraction to use trace-specific span filtering and cleanup

Confidence score: 4/5

  • This PR addresses a real concurrency issue but still has some style and type annotation concerns from previous reviews
  • Score reflects solid implementation of trace isolation but deducted one point due to unaddressed style issues (missing return type annotations, potential silent failures, and trace_id typing)
  • Pay close attention to the SpanStore refactor as it changes core data structures used across the tracing system

Sequence Diagram

sequenceDiagram
    participant User
    participant FireworksTrainer
    participant TrainableModel
    participant Tracer
    participant SpanStore
    participant AgentFunction
    participant JudgmentClient
    participant Dataset

    User->>FireworksTrainer: "train(agent_function, scorers, prompts)"
    FireworksTrainer->>FireworksTrainer: "run_reinforcement_learning()"
    
    loop for each training step
        FireworksTrainer->>TrainableModel: "advance_to_next_step(step)"
        FireworksTrainer->>FireworksTrainer: "generate_rollouts_and_rewards()"
        
        loop for each prompt and generation
            FireworksTrainer->>AgentFunction: "agent_function(**prompt_input)"
            AgentFunction-->>FireworksTrainer: "response_data"
            
            FireworksTrainer->>Tracer: "get_current_span()"
            Tracer-->>FireworksTrainer: "current_span"
            
            FireworksTrainer->>SpanStore: "get_by_trace_id(trace_id)"
            SpanStore-->>FireworksTrainer: "spans"
            
            FireworksTrainer->>FireworksTrainer: "_extract_message_history_from_spans()"
            
            FireworksTrainer->>JudgmentClient: "run_evaluation(examples, scorers)"
            JudgmentClient-->>FireworksTrainer: "scoring_results"
            
            FireworksTrainer->>SpanStore: "clear_trace(trace_id)"
        end
        
        FireworksTrainer->>Dataset: "Dataset.from_list(dataset_rows)"
        Dataset-->>FireworksTrainer: "dataset"
        
        FireworksTrainer->>Dataset: "sync()"
        
        FireworksTrainer->>TrainableModel: "perform_reinforcement_step(dataset, step)"
        TrainableModel-->>FireworksTrainer: "job"
        
        loop while job not completed
            FireworksTrainer->>TrainableModel: "job.get()"
            TrainableModel-->>FireworksTrainer: "updated_job"
        end
        
        FireworksTrainer->>Dataset: "delete()"
    end
    
    FireworksTrainer->>TrainableModel: "advance_to_next_step(num_steps)"
    FireworksTrainer->>TrainableModel: "get_model_config(training_params)"
    TrainableModel-->>FireworksTrainer: "model_config"
    FireworksTrainer-->>User: "model_config"
Loading

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@Mandolaro Mandolaro merged commit 2d07e47 into staging Oct 17, 2025
17 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant