Fix Sharing SpanStore #608

Mandolaro · 2025-10-16T06:27:12Z

📝 Summary

1. Fix Span Store

✅ Checklist

Docs updated (if necessary)
Changelogs are updated (if necessary)

gemini-code-assist · 2025-10-16T06:27:23Z

Summary of Changes

Hello @Mandolaro, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue in the SpanStore by refactoring its internal storage mechanism to organize spans by their trace ID. This change ensures that tracing data is properly isolated and managed for individual traces, preventing potential data mixing or incorrect clearing, particularly in contexts like the FireworksTrainer where specific trace histories are crucial for training. The update enhances the accuracy and reliability of span retrieval and cleanup operations.

Highlights

SpanStore Refactoring: The SpanStore class has been refactored to store and manage spans on a per-trace-ID basis, moving from a flat list of all spans to a dictionary mapping trace IDs to lists of their respective spans. This improves isolation and management of tracing data.
New SpanStore Methods: New abstract methods get_by_trace_id and clear_trace have been added to ABCSpanStore and implemented in SpanStore. These methods allow for retrieving and clearing spans associated with a specific trace ID.
FireworksTrainer Integration: The FireworksTrainer has been updated to leverage the new trace-ID-specific span management. It now extracts message history and clears spans based on the current trace ID, ensuring that training data is correctly isolated per generation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

greptile-apps

Greptile Overview

Greptile Summary

This PR fixes a critical concurrency issue in the tracing system by introducing trace-specific span management. The changes refactor the shared span store architecture to prevent data corruption when multiple training operations run concurrently.

The SpanStore class is redesigned to organize spans by trace ID using a dictionary structure (_spans_by_trace) instead of a flat list. This includes new methods get_by_trace_id() and clear_trace() for trace-specific operations, enabling O(1) access to spans for a given trace rather than O(n) linear searches.

The FireworksTrainer is updated to use these trace-specific operations. The _extract_message_history_from_spans method now accepts a trace_id parameter and filters spans accordingly. During training execution, the code extracts the current trace ID from the span context and uses it for both message history extraction and cleanup, replacing the previous global span clearing that could affect concurrent operations.

These changes ensure proper isolation between concurrent training operations, where each trace only accesses its own spans, making the system safe for multi-threaded or multi-process environments.

Important Files Changed

Changed Files

Filename	Score	Overview
src/judgeval/tracer/exporters/store.py	4/5	Refactored SpanStore to use trace-grouped dictionary structure with new trace-specific methods
src/judgeval/trainer/fireworks_trainer.py	4/5	Updated training logic to use trace-specific span operations and proper trace ID extraction

Confidence score: 4/5

This PR addresses a critical concurrency bug and is generally safe to merge with good architectural improvements
Score reflects well-structured changes that follow good software design patterns, but deducted one point due to potential edge cases around span context availability and error handling
Pay close attention to the trace ID extraction logic in fireworks_trainer.py and ensure proper testing of concurrent scenarios

Sequence Diagram

sequenceDiagram
    participant User
    participant FireworksTrainer
    participant SpanStore
    participant Tracer
    participant AgentFunction
    participant JudgmentClient
    
    User->>FireworksTrainer: "train(agent_function, scorers, prompts)"
    FireworksTrainer->>FireworksTrainer: "run_reinforcement_learning()"
    
    loop for each training step
        FireworksTrainer->>FireworksTrainer: "generate_rollouts_and_rewards()"
        
        loop for each prompt and generation
            FireworksTrainer->>Tracer: "@tracer.observe(span_type='function')"
            FireworksTrainer->>AgentFunction: "await agent_function(**prompt_input)"
            AgentFunction-->>FireworksTrainer: "response_data"
            
            FireworksTrainer->>Tracer: "get_current_span()"
            Tracer-->>FireworksTrainer: "current_span"
            FireworksTrainer->>FireworksTrainer: "extract trace_id from span context"
            
            FireworksTrainer->>SpanStore: "get_by_trace_id(trace_id)"
            SpanStore-->>FireworksTrainer: "spans for trace"
            FireworksTrainer->>FireworksTrainer: "_extract_message_history_from_spans()"
            
            FireworksTrainer->>JudgmentClient: "run_evaluation(examples, scorers)"
            JudgmentClient-->>FireworksTrainer: "scoring_results"
            
            FireworksTrainer->>SpanStore: "clear_trace(trace_id)"
            SpanStore->>SpanStore: "delete spans for trace_id"
        end
        
        FireworksTrainer->>FireworksTrainer: "perform_reinforcement_step(dataset)"
    end
    
    FireworksTrainer-->>User: "ModelConfig"

_{2 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

src/judgeval/tracer/exporters/store.py

gemini-code-assist

Code Review

This pull request refactors the SpanStore to be trace-aware by partitioning spans by trace_id. This is an excellent approach to address potential concurrency issues arising from a shared SpanStore in an asynchronous environment. The changes in fireworks_trainer.py correctly utilize this new functionality to isolate spans for each generation, which should resolve the underlying issue of mixed-up traces. The implementation is solid, and I've added a few minor suggestions to improve code conciseness and adhere to Pythonic idioms.

src/judgeval/tracer/exporters/store.py

src/judgeval/trainer/fireworks_trainer.py

greptile-apps

Greptile Overview

Greptile Summary

This PR fixes a critical concurrency issue in the FireworksTrainer where multiple training operations were sharing a single SpanStore instance, causing data leakage and race conditions. The solution refactors the SpanStore to organize spans by trace ID rather than in a flat list, enabling proper isolation between concurrent operations.

The key changes include: (1) Refactoring SpanStore to use trace-based storage with new methods get_by_trace_id() and clear_trace(), (2) Updating FireworksTrainer._extract_message_history_from_spans() to accept a trace_id parameter and filter spans accordingly, and (3) Replacing global span clearing with trace-specific cleanup to prevent interference between concurrent operations. This ensures each training operation only processes its own spans while maintaining backward compatibility for existing SpanStore methods.

Important Files Changed

Changed Files

Filename	Score	Overview
src/judgeval/tracer/exporters/store.py	4/5	Refactored SpanStore to organize spans by trace ID with new methods for trace-based operations
src/judgeval/trainer/fireworks_trainer.py	4/5	Updated message history extraction to use trace-specific span filtering and cleanup

Confidence score: 4/5

This PR addresses a real concurrency issue but still has some style and type annotation concerns from previous reviews
Score reflects solid implementation of trace isolation but deducted one point due to unaddressed style issues (missing return type annotations, potential silent failures, and trace_id typing)
Pay close attention to the SpanStore refactor as it changes core data structures used across the tracing system

Sequence Diagram

sequenceDiagram
    participant User
    participant FireworksTrainer
    participant TrainableModel
    participant Tracer
    participant SpanStore
    participant AgentFunction
    participant JudgmentClient
    participant Dataset

    User->>FireworksTrainer: "train(agent_function, scorers, prompts)"
    FireworksTrainer->>FireworksTrainer: "run_reinforcement_learning()"
    
    loop for each training step
        FireworksTrainer->>TrainableModel: "advance_to_next_step(step)"
        FireworksTrainer->>FireworksTrainer: "generate_rollouts_and_rewards()"
        
        loop for each prompt and generation
            FireworksTrainer->>AgentFunction: "agent_function(**prompt_input)"
            AgentFunction-->>FireworksTrainer: "response_data"
            
            FireworksTrainer->>Tracer: "get_current_span()"
            Tracer-->>FireworksTrainer: "current_span"
            
            FireworksTrainer->>SpanStore: "get_by_trace_id(trace_id)"
            SpanStore-->>FireworksTrainer: "spans"
            
            FireworksTrainer->>FireworksTrainer: "_extract_message_history_from_spans()"
            
            FireworksTrainer->>JudgmentClient: "run_evaluation(examples, scorers)"
            JudgmentClient-->>FireworksTrainer: "scoring_results"
            
            FireworksTrainer->>SpanStore: "clear_trace(trace_id)"
        end
        
        FireworksTrainer->>Dataset: "Dataset.from_list(dataset_rows)"
        Dataset-->>FireworksTrainer: "dataset"
        
        FireworksTrainer->>Dataset: "sync()"
        
        FireworksTrainer->>TrainableModel: "perform_reinforcement_step(dataset, step)"
        TrainableModel-->>FireworksTrainer: "job"
        
        loop while job not completed
            FireworksTrainer->>TrainableModel: "job.get()"
            TrainableModel-->>FireworksTrainer: "updated_job"
        end
        
        FireworksTrainer->>Dataset: "delete()"
    end
    
    FireworksTrainer->>TrainableModel: "advance_to_next_step(num_steps)"
    FireworksTrainer->>TrainableModel: "get_model_config(training_params)"
    TrainableModel-->>FireworksTrainer: "model_config"
    FireworksTrainer-->>User: "model_config"

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

src/judgeval/tracer/exporters/store.py

fix multithreading

380d23f

Mandolaro requested a review from alanzhang25 October 16, 2025 06:27

greptile-apps bot reviewed Oct 16, 2025

View reviewed changes

src/judgeval/tracer/exporters/store.py Outdated Show resolved Hide resolved

src/judgeval/tracer/exporters/store.py Outdated Show resolved Hide resolved

src/judgeval/tracer/exporters/store.py Show resolved Hide resolved

Mandolaro changed the base branch from main to staging October 16, 2025 06:28

Mandolaro closed this Oct 16, 2025

Mandolaro reopened this Oct 16, 2025

gemini-code-assist bot reviewed Oct 16, 2025

View reviewed changes

src/judgeval/tracer/exporters/store.py Show resolved Hide resolved

src/judgeval/tracer/exporters/store.py Show resolved Hide resolved

src/judgeval/trainer/fireworks_trainer.py Outdated Show resolved Hide resolved

greptile-apps bot reviewed Oct 16, 2025

View reviewed changes

propel-code-bot bot reviewed Oct 16, 2025

View reviewed changes

src/judgeval/tracer/exporters/store.py Outdated Show resolved Hide resolved

Mandolaro and others added 3 commits October 15, 2025 23:32

fix based on comments

a49a278

fix based on comments

9714e5d

Merge branch 'staging' into trainer_fix

c1d9cb0

propel-code-bot bot reviewed Oct 16, 2025

View reviewed changes

src/judgeval/tracer/exporters/store.py Outdated Show resolved Hide resolved

remove unused

6bd1529

Mandolaro merged commit 2d07e47 into staging Oct 17, 2025
17 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Sharing SpanStore #608

Fix Sharing SpanStore #608

Uh oh!

Mandolaro commented Oct 16, 2025

Uh oh!

gemini-code-assist bot commented Oct 16, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix Sharing SpanStore #608

Fix Sharing SpanStore #608

Uh oh!

Conversation

Mandolaro commented Oct 16, 2025

📝 Summary

✅ Checklist

Uh oh!

gemini-code-assist bot commented Oct 16, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Important Files Changed

Confidence score: 4/5

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Important Files Changed

Confidence score: 4/5

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant