Skip to content

Conversation

@abhishekg999
Copy link
Contributor

πŸ“ Summary

  • 1. ...

βœ… Checklist

github-actions bot and others added 6 commits October 14, 2025 11:22
Co-authored-by: judgment-release-bot <219946635+judgment-release-bot@users.noreply.github.com>
Co-authored-by: Aaryan Divate <44125685+adivate2021@users.noreply.github.com>
…596)

* feat: immutable wrappers

* testing concepts of openai wrapper

* tests

* openai

* fix: error defaults on mutate args and kwargs

* split stream wrappers

* feat: anthropic wrapping

* minor

* test the tests

* anthropic code formats

* gemini

* google and gemini

* update lock

* fix: wrapper

* mypy

* mypy

* immutable wrapper where possible

* responses fix

* better safety access

* remove groq e2e

* fix together

* chore: greptile

* chore: iterator instead of generator

* run one set of uts

* add version matrix back

* add os matrix back

---------

Co-authored-by: Alan <alanzhang2021@gmail.com>
Co-authored-by: Alan Zhang <97066812+alanzhang25@users.noreply.github.com>
Co-authored-by: Justin Sheu <justinsheu3341@gmail.com>
* Organized UTs

* fix flaky

* gemini and anthropic keys

---------

Co-authored-by: Justin Sheu <justinsheu3341@gmail.com>
* Cache Tokens Tests

* randomize
* only return project id

* update types
@abhishekg999
Copy link
Contributor Author

@greptile

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @abhishekg999, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request represents a substantial architectural improvement to the LLM client tracing infrastructure. By introducing a new set of generic wrapping utilities, the integration of various LLM providers becomes more modular, robust, and easier to maintain. Concurrently, the decision was made to remove Groq integration and the local evaluation queue, streamlining the project's scope and internal processes. These changes aim to provide a more stable and extensible foundation for future development.

Highlights

  • LLM Client Wrapping Refactor: The core LLM client wrapping logic for OpenAI, Anthropic, Google GenAI, and Together has been significantly refactored into modular, reusable utility functions. This enhances maintainability and consistency across different LLM integrations.
  • New Generic Wrapper Utilities: Introduced a new set of generic immutable_wrap and mutable_wrap utilities for synchronous, asynchronous, and iterator functions. These provide standardized lifecycle hooks (pre, post, error, finally) for robust function and iterator interception.
  • Groq Integration Removed: Support for Groq LLM clients has been entirely removed from the codebase, including its configuration, wrapper, and associated end-to-end tests.
  • Local Evaluation Queue Deprecated: The LocalEvaluationQueue module, previously used for batching local custom scorer evaluations, has been removed, indicating a shift in how local evaluations are managed.
  • API Type Simplification: The project_created boolean field has been removed from the ResolveProjectNameResponse API and data models, streamlining the project resolution response.
  • Enhanced dont_throw Decorator: The dont_throw decorator has been improved to support default return values and provide better type hinting, making error handling more flexible.
  • Comprehensive Testing: Extensive new tests have been added for the generic wrapper utilities and the refactored LLM client integrations, ensuring high code quality and reliability.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/ci.yaml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with πŸ‘ and πŸ‘Ž on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This is a major staging-to-main merge that refactors the LLM tracer architecture and removes Groq provider support. The key changes include:

Core Architecture Changes:

  • Complete removal of local evaluation queue functionality from the tracer, eliminating LocalEvaluationQueue and related worker threads
  • Refactoring of LLM provider wrappers from monolithic files to modular structures (e.g., OpenAI wrapper split into chat_completions.py, responses.py, beta_chat_completions.py)
  • Migration from centralized provider imports to direct imports in config modules

Groq Provider Removal:

  • Systematic removal of all Groq-related code including wrapper, config, tests, and provider type enum entry
  • Updates to CI workflow removing Groq client initialization and tests

New Wrapper Utilities System:

  • Introduction of comprehensive wrapper utilities under src/judgeval/utils/wrappers/ including sync/async variants for both mutable and immutable operations
  • New wrapper functions: immutable_wrap_sync, immutable_wrap_async, mutable_wrap_sync, mutable_wrap_async, and iterator variants
  • Enhanced error handling with dont_throw and identity_on_throw decorators

LLM Provider Enhancements:

  • Added support for new API endpoints (OpenAI's beta chat completions parse API, Anthropic streaming)
  • Expanded CI support for Gemini and Anthropic API keys
  • Comprehensive test coverage additions for all remaining providers

API Changes:

  • Removal of project_created field from ResolveProjectNameResponse in auto-generated API types
  • Pre-commit tool version updates (uv-pre-commit to 0.9.2, ruff to v0.14.0)

The refactoring follows a consistent pattern of modularizing complex wrappers while maintaining backward compatibility for public APIs.

Important Files Changed

Changed Files
Filename Score Overview
src/judgeval/tracer/__init__.py 4/5 Removed local evaluation queue functionality and simplified project resolution logic
src/judgeval/tracer/local_eval_queue.py 2/5 Completely removed file containing LocalEvaluationQueue implementation (199 lines deleted)
src/judgeval/tracer/llm/constants.py 4/5 Removed GROQ from ProviderType enum, breaking change for Groq usage
src/judgeval/tracer/llm/config.py 2/5 Major refactor removing Groq support and changing provider detection to direct imports
src/judgeval/tracer/llm/llm_groq/wrapper.py 1/5 Entire Groq wrapper implementation deleted (498 lines removed)
src/judgeval/tracer/llm/llm_groq/config.py 1/5 Complete deletion of Groq configuration module
src/judgeval/tracer/llm/llm_openai/wrapper.py 4/5 Major refactor from monolithic (661 lines) to modular design (63 lines)
src/judgeval/tracer/llm/llm_anthropic/wrapper.py 3/5 Significant refactor with potential type checking issues from generic usage
src/judgeval/tracer/llm/llm_google/wrapper.py 4/5 Simplified from 465 lines to 30 lines by extracting logic to separate modules
src/judgeval/tracer/llm/llm_together/wrapper.py 4/5 Major refactor from ~500 lines to 52 lines with modular delegation
src/judgeval/utils/wrappers/__init__.py 5/5 New package interface exposing6 wrapper functions with clean public API
src/judgeval/utils/wrappers/immutable_wrap_sync.py 5/5 New synchronous wrapper with lifecycle hooks and error protection
src/judgeval/utils/wrappers/mutable_wrap_sync.py 5/5 New mutable synchronous wrapper allowing argument/result modification
src/judgeval/utils/decorators/dont_throw.py 4/5 Enhanced decorator with improved type safety and custom default values
src/e2etests/test_tracer.py 5/5 Clean removal of all Groq client integrations while preserving other providers
.github/workflows/ci.yaml 4/5 Added GEMINI_API_KEY and ANTHROPIC_API_KEY environment variables and parallel test execution

Confidence score: 3/5

  • This PR contains significant architectural changes that require careful review due to the removal of major functionality (Groq support, local evaluation queue)
  • Score reflects the large scope of changes, potential breaking changes from Groq removal, and complex refactoring of core tracer components
  • Multiple files show concerning patterns like unsafe attribute access and potential type checking issues that need attention

Sequence Diagram

sequenceDiagram
    participant User
    participant GitHubActions as "GitHub Actions"
    participant CI as "CI Workflow"
    participant BranchValidator as "Branch Validator"
    participant TestRunner as "Test Runner"
    participant AWSSecretsManager as "AWS Secrets Manager"
    participant JudgmentServer as "Judgment Server"
    
    User->>GitHubActions: "Push to staging branch"
    GitHubActions->>CI: "Trigger CI workflow"
    
    CI->>BranchValidator: "Validate merge branch"
    BranchValidator-->>CI: "Validation result"
    
    alt Branch validation success or skipped
        par Unit Tests
            CI->>TestRunner: "Run unit tests (Python 3.10-3.13, Ubuntu/macOS)"
            TestRunner->>TestRunner: "Install dependencies with uv"
            TestRunner->>TestRunner: "Run pytest tests"
            TestRunner-->>CI: "Unit test results"
        and E2E Tests (if main/staging branch)
            CI->>TestRunner: "Configure AWS credentials"
            CI->>TestRunner: "Set environment variables"
            CI->>JudgmentServer: "Health check"
            JudgmentServer-->>CI: "Health status"
            
            alt Server is healthy
                CI->>AWSSecretsManager: "Retrieve secrets"
                AWSSecretsManager-->>CI: "API keys and secrets"
                CI->>TestRunner: "Run E2E tests with secrets"
                TestRunner->>TestRunner: "Execute pytest e2etests"
                TestRunner-->>CI: "Coverage report"
                CI->>GitHubActions: "Upload coverage artifacts"
            else Server unhealthy
                CI->>CI: "Fail job with error message"
            end
        end
    else Branch validation failed
        CI->>CI: "Skip tests"
    end
    
    CI-->>User: "CI results and artifacts"
Loading

Additional Comments (2)

  1. src/judgeval/tracer/llm/llm_groq/wrapper.py, line 1 (link)

    logic: Complete removal of Groq wrapper - this breaks backward compatibility for users depending on Groq tracing. Verify this is intentional and that migration path exists.

  2. src/judgeval/tracer/llm/llm_groq/config.py, line 1 (link)

    logic: Deleting this file will break imports that depend on HAS_GROQ, groq_Groq, and groq_AsyncGroq. Check that all references to these exports have been updated elsewhere in the codebase.

66 files reviewed, 33 comments

Edit Code Review Agent Settings | Greptile

if span:
span.record_exception(error)

wrapped = mutable_wrap_sync(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: using mutable_wrap_sync for async function - should be mutable_wrap_async

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/judgeval/tracer/llm/llm_anthropic/messages_stream.py
Line: 315:315

Comment:
**syntax:** using mutable_wrap_sync for async function - should be mutable_wrap_async

How can I resolve this? If you propose a fix, please make it concise.

from judgeval.tracer.llm.llm_openai.config import HAS_OPENAI
from judgeval.tracer.llm.llm_together.config import HAS_TOGETHER
from judgeval.tracer.llm.llm_anthropic.config import HAS_ANTHROPIC
from judgeval.tracer.llm.llm_google.config import HAS_GOOGLE_GENAI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Groq provider imports and exports were removed but HAS_GROQ flag is missing from the new imports

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/judgeval/tracer/llm/providers.py
Line: 7:7

Comment:
**logic:** Groq provider imports and exports were removed but HAS_GROQ flag is missing from the new imports

How can I resolve this? If you propose a fix, please make it concise.

span: The span to validate
attrs: Span attributes dictionary
expected_span_name: Expected span name (e.g., "ANTHROPIC_API_CALL")
expected_model_name: Expected model name (optional)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Docstring says 'optional' but the function signature shows expected_model_name as required parameter

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tests/tracer/llm/utils.py
Line: 28:28

Comment:
**style:** Docstring says 'optional' but the function signature shows expected_model_name as required parameter

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +92 to +94
if span.events:
event_names = [event.name for event in span.events]
assert any("exception" in name.lower() for name in event_names)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Exception validation will pass if span.events is empty - should this be more explicit about requiring exception events?

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tests/tracer/llm/utils.py
Line: 92:94

Comment:
**logic:** Exception validation will pass if span.events is empty - should this be more explicit about requiring exception events?

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +26 to +30
if isinstance(client, Client):
wrap_generate_content_sync(tracer, client)
return client
else:
raise TypeError(f"Invalid client type: {type(client)}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Missing async client support. The old implementation handled both sync (google_genai_Client) and async (google_genai_AsyncClient) clients, but the new implementation only handles sync clients via wrap_generate_content_sync.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/judgeval/tracer/llm/llm_google/wrapper.py
Line: 26:30

Comment:
**logic:** Missing async client support. The old implementation handled both sync (`google_genai_Client`) and async (`google_genai_AsyncClient`) clients, but the new implementation only handles sync clients via `wrap_generate_content_sync`.

How can I resolve this? If you propose a fix, please make it concise.

def __init__(self):
self.started_spans = []
self.ended_spans = []
self.resource_attributes = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: resource_attributes is defined but never used in the class

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tests/tracer/llm/conftest.py
Line: 15:15

Comment:
**style:** `resource_attributes` is defined but never used in the class

How can I resolve this? If you propose a fix, please make it concise.

# Set up minimal TracerProvider with mock processor
provider = TracerProvider()
provider.add_span_processor(mock_processor)
set_tracer_provider(provider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Setting global tracer provider in tests can cause side effects between tests - consider using a context manager or cleanup

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tests/tracer/llm/conftest.py
Line: 67:67

Comment:
**style:** Setting global tracer provider in tests can cause side effects between tests - consider using a context manager or cleanup

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +77 to +80
@pytest.fixture
def tracer_with_mock(tracer):
"""Alias for tracer - both now use the mock processor"""
return tracer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: tracer_with_mock fixture is redundant - it just returns the same tracer fixture

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tests/tracer/llm/conftest.py
Line: 77:80

Comment:
**style:** `tracer_with_mock` fixture is redundant - it just returns the same `tracer` fixture

How can I resolve this? If you propose a fix, please make it concise.

if usage_data:
prompt_tokens = usage_data.input_tokens or 0
completion_tokens = usage_data.output_tokens or 0
cache_read = usage_data.input_tokens_details.cached_tokens or 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Direct attribute access could cause AttributeError if input_tokens_details is None. Consider using safe access pattern like the streaming implementation.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/judgeval/tracer/llm/llm_openai/responses.py
Line: 81:81

Comment:
**logic:** Direct attribute access could cause AttributeError if `input_tokens_details` is None. Consider using safe access pattern like the streaming implementation.

How can I resolve this? If you propose a fix, please make it concise.

if usage_data:
prompt_tokens = usage_data.input_tokens or 0
completion_tokens = usage_data.output_tokens or 0
cache_read = usage_data.input_tokens_details.cached_tokens or 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Same unsafe attribute access issue as line 81 - should use getattr() with default value or null checking.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/judgeval/tracer/llm/llm_openai/responses.py
Line: 285:285

Comment:
**logic:** Same unsafe attribute access issue as line 81 - should use getattr() with default value or null checking.

How can I resolve this? If you propose a fix, please make it concise.

@@ -0,0 +1,3 @@
# Wrapper Utilities

Ensure 100% test coverage for all files in this folder

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Documentation]

Add a period at the end of the sentence for proper punctuation.

Context for Agents
[**Documentation**]

Add a period at the end of the sentence for proper punctuation.

File: src/judgeval/utils/wrappers/README.md
Line: 3

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and impressive refactoring of the LLM provider wrapping logic, moving from monolithic wrappers to a much more modular and maintainable structure using new wrapper utilities. This is a great improvement for the codebase. Key changes include the removal of support for Groq and the local evaluation queue, which are major breaking changes that should be clearly communicated to users. The wait_for_completion method has also been removed from the Tracer class. While these removals seem intentional and are applied consistently, the PR also appears to have unintentionally dropped support for Google's async client, which I've flagged as a high-severity regression. Overall, this is a very positive structural change, accompanied by an excellent suite of new tests.

Comment on lines 37 to +41
if HAS_GOOGLE_GENAI:
from judgeval.tracer.llm.providers import (
google_genai_Client,
google_genai_AsyncClient,
)

assert google_genai_Client is not None, "Google GenAI client not found"
assert google_genai_AsyncClient is not None, (
"Google GenAI async client not found"
)
if isinstance(client, (google_genai_Client, google_genai_AsyncClient)):
return ProviderType.GOOGLE
from google.genai import Client as GoogleClient

if HAS_GROQ:
from judgeval.tracer.llm.providers import groq_Groq, groq_AsyncGroq

assert groq_Groq is not None, "Groq client not found"
assert groq_AsyncGroq is not None, "Groq async client not found"
if isinstance(client, (groq_Groq, groq_AsyncGroq)):
return ProviderType.GROQ
if isinstance(client, GoogleClient):
return ProviderType.GOOGLE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This provider detection logic for Google GenAI seems to have dropped support for the AsyncClient. The previous implementation checked for both google.genai.Client and google.genai.client.AsyncClient. This change means async Google clients will no longer be correctly identified and wrapped, which is a regression. Was the removal of async support for Google GenAI intentional? If not, the check should be updated to include AsyncClient and the corresponding async wrapper logic should be implemented.

Comment on lines +13 to +30
def wrap_google_client(tracer: Tracer, client: Client) -> Client:
from judgeval.tracer.llm.llm_google.config import HAS_GOOGLE_GENAI
from judgeval.logger import judgeval_logger

def _format_google_output(
response: GoogleGenerateContentResponse,
) -> Tuple[Optional[str], Optional[GoogleUsageMetadata]]:
message_content: Optional[str] = None
usage_data: Optional[GoogleUsageMetadata] = None

try:
if isinstance(response, GoogleGenerateContentResponse):
usage_data = response.usage_metadata
if response.candidates and len(response.candidates) > 0:
candidate = response.candidates[0]
if (
candidate.content
and candidate.content.parts
and len(candidate.content.parts) > 0
):
message_content = candidate.content.parts[0].text
except (AttributeError, IndexError, TypeError):
pass

return message_content, usage_data


class TracedGoogleGenerator:
def __init__(
self,
tracer: Tracer,
generator: Iterator[GoogleStreamChunk],
client: GoogleClientType,
span: Span,
model_name: str,
):
self.tracer = tracer
self.generator = generator
self.client = client
self.span = span
self.model_name = model_name
self.accumulated_content = ""

def __iter__(self) -> Iterator[GoogleStreamChunk]:
return self

def __next__(self) -> GoogleStreamChunk:
try:
chunk = next(self.generator)
content = _extract_google_content(chunk)
if content:
self.accumulated_content += content
if chunk.usage_metadata:
prompt_tokens, completion_tokens, cache_read, cache_creation = (
_extract_google_tokens(chunk.usage_metadata)
)
set_span_attribute(
self.span, AttributeKeys.GEN_AI_USAGE_INPUT_TOKENS, prompt_tokens
)
set_span_attribute(
self.span,
AttributeKeys.GEN_AI_USAGE_OUTPUT_TOKENS,
completion_tokens,
)
set_span_attribute(
self.span,
AttributeKeys.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS,
cache_read,
)
set_span_attribute(
self.span,
AttributeKeys.JUDGMENT_USAGE_METADATA,
safe_serialize(chunk.usage_metadata),
)
return chunk
except StopIteration:
set_span_attribute(
self.span, AttributeKeys.GEN_AI_COMPLETION, self.accumulated_content
)
self.span.end()
raise
except Exception as e:
if self.span:
self.span.record_exception(e)
self.span.end()
raise


class TracedGoogleAsyncGenerator:
def __init__(
self,
tracer: Tracer,
async_generator: AsyncIterator[GoogleStreamChunk],
client: GoogleClientType,
span: Span,
model_name: str,
):
self.tracer = tracer
self.async_generator = async_generator
self.client = client
self.span = span
self.model_name = model_name
self.accumulated_content = ""

def __aiter__(self) -> AsyncIterator[GoogleStreamChunk]:
return self

async def __anext__(self) -> GoogleStreamChunk:
try:
chunk = await self.async_generator.__anext__()
content = _extract_google_content(chunk)
if content:
self.accumulated_content += content
if chunk.usage_metadata:
prompt_tokens, completion_tokens, cache_read, cache_creation = (
_extract_google_tokens(chunk.usage_metadata)
)
set_span_attribute(
self.span, AttributeKeys.GEN_AI_USAGE_INPUT_TOKENS, prompt_tokens
)
set_span_attribute(
self.span,
AttributeKeys.GEN_AI_USAGE_OUTPUT_TOKENS,
completion_tokens,
)
set_span_attribute(
self.span,
AttributeKeys.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS,
cache_read,
)
set_span_attribute(
self.span,
AttributeKeys.JUDGMENT_USAGE_METADATA,
safe_serialize(chunk.usage_metadata),
)
return chunk
except StopAsyncIteration:
set_span_attribute(
self.span, AttributeKeys.GEN_AI_COMPLETION, self.accumulated_content
)
self.span.end()
raise
except Exception as e:
if self.span:
self.span.record_exception(e)
self.span.end()
raise


def wrap_google_client(tracer: Tracer, client: GoogleClientType) -> GoogleClientType:
def wrapped(function: Callable, span_name: str):
@functools.wraps(function)
def wrapper(*args, **kwargs):
if kwargs.get("stream", False):
try:
span = tracer.get_tracer().start_span(
span_name, attributes={AttributeKeys.JUDGMENT_SPAN_KIND: "llm"}
)
tracer.add_agent_attributes_to_span(span)
set_span_attribute(
span, AttributeKeys.GEN_AI_PROMPT, safe_serialize(kwargs)
)
model_name = kwargs.get("model", "")
set_span_attribute(
span, AttributeKeys.GEN_AI_REQUEST_MODEL, model_name
)
except Exception as e:
judgeval_logger.error(
f"[google wrapped] Error adding span metadata: {e}"
)
stream_response = function(*args, **kwargs)
return TracedGoogleGenerator(
tracer, stream_response, client, span, model_name
)
else:
with sync_span_context(
tracer, span_name, {AttributeKeys.JUDGMENT_SPAN_KIND: "llm"}
) as span:
try:
tracer.add_agent_attributes_to_span(span)
set_span_attribute(
span, AttributeKeys.GEN_AI_PROMPT, safe_serialize(kwargs)
)
model_name = kwargs.get("model", "")
set_span_attribute(
span, AttributeKeys.GEN_AI_REQUEST_MODEL, model_name
)
except Exception as e:
judgeval_logger.error(
f"[google wrapped] Error adding span metadata: {e}"
)

response = function(*args, **kwargs)

try:
if isinstance(response, GoogleGenerateContentResponse):
output, usage_data = _format_google_output(response)
set_span_attribute(
span, AttributeKeys.GEN_AI_COMPLETION, output
)
if usage_data:
(
prompt_tokens,
completion_tokens,
cache_read,
cache_creation,
) = _extract_google_tokens(usage_data)
set_span_attribute(
span,
AttributeKeys.GEN_AI_USAGE_INPUT_TOKENS,
prompt_tokens,
)
set_span_attribute(
span,
AttributeKeys.GEN_AI_USAGE_OUTPUT_TOKENS,
completion_tokens,
)
set_span_attribute(
span,
AttributeKeys.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS,
cache_read,
)
set_span_attribute(
span,
AttributeKeys.JUDGMENT_USAGE_METADATA,
safe_serialize(usage_data),
)
set_span_attribute(
span,
AttributeKeys.GEN_AI_RESPONSE_MODEL,
getattr(response, "model_version", model_name),
)
except Exception as e:
judgeval_logger.error(
f"[google wrapped] Error adding span metadata: {e}"
)
finally:
return response

return wrapper

def wrapped_async(function: Callable, span_name: str):
@functools.wraps(function)
async def wrapper(*args, **kwargs):
if kwargs.get("stream", False):
try:
span = tracer.get_tracer().start_span(
span_name, attributes={AttributeKeys.JUDGMENT_SPAN_KIND: "llm"}
)
tracer.add_agent_attributes_to_span(span)
set_span_attribute(
span, AttributeKeys.GEN_AI_PROMPT, safe_serialize(kwargs)
)
model_name = kwargs.get("model", "")
set_span_attribute(
span, AttributeKeys.GEN_AI_REQUEST_MODEL, model_name
)
except Exception as e:
judgeval_logger.error(
f"[google wrapped_async] Error adding span metadata: {e}"
)
stream_response = await function(*args, **kwargs)
return TracedGoogleAsyncGenerator(
tracer, stream_response, client, span, model_name
)
else:
async with async_span_context(
tracer, span_name, {AttributeKeys.JUDGMENT_SPAN_KIND: "llm"}
) as span:
try:
tracer.add_agent_attributes_to_span(span)
set_span_attribute(
span, AttributeKeys.GEN_AI_PROMPT, safe_serialize(kwargs)
)
model_name = kwargs.get("model", "")
set_span_attribute(
span, AttributeKeys.GEN_AI_REQUEST_MODEL, model_name
)
except Exception as e:
judgeval_logger.error(
f"[google wrapped_async] Error adding span metadata: {e}"
)

response = await function(*args, **kwargs)

try:
if isinstance(response, GoogleGenerateContentResponse):
output, usage_data = _format_google_output(response)
set_span_attribute(
span, AttributeKeys.GEN_AI_COMPLETION, output
)
if usage_data:
(
prompt_tokens,
completion_tokens,
cache_read,
cache_creation,
) = _extract_google_tokens(usage_data)
set_span_attribute(
span,
AttributeKeys.GEN_AI_USAGE_INPUT_TOKENS,
prompt_tokens,
)
set_span_attribute(
span,
AttributeKeys.GEN_AI_USAGE_OUTPUT_TOKENS,
completion_tokens,
)
set_span_attribute(
span,
AttributeKeys.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS,
cache_read,
)
set_span_attribute(
span,
AttributeKeys.JUDGMENT_USAGE_METADATA,
safe_serialize(usage_data),
)
set_span_attribute(
span,
AttributeKeys.GEN_AI_RESPONSE_MODEL,
getattr(response, "model_version", model_name),
)
except Exception as e:
judgeval_logger.error(
f"[google wrapped_async] Error adding span metadata: {e}"
)
finally:
return response

return wrapper

span_name = "GOOGLE_API_CALL"
if google_genai_Client is not None and isinstance(client, google_genai_Client):
# Type narrowing for mypy
google_client = client # type: ignore[assignment]
setattr(
google_client.models,
"generate_content",
wrapped(google_client.models.generate_content, span_name),
)
elif google_genai_AsyncClient is not None and isinstance(
client, google_genai_AsyncClient
):
# Type narrowing for mypy
async_google_client = client # type: ignore[assignment]
setattr(
async_google_client.models,
"generate_content",
wrapped_async(async_google_client.models.generate_content, span_name),
if not HAS_GOOGLE_GENAI:
judgeval_logger.error(
"Cannot wrap Google GenAI client: 'google-genai' library not installed. "
"Install it with: pip install google-genai"
)
return client

from google.genai import Client

return client
if isinstance(client, Client):
wrap_generate_content_sync(tracer, client)
return client
else:
raise TypeError(f"Invalid client type: {type(client)}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This wrapper only handles the synchronous google.genai.Client. The previous implementation also supported the asynchronous AsyncClient. Removing this support is a significant breaking change. If this was unintentional, please add back support for AsyncClient by creating an async version of the wrapper, similar to how it's handled for OpenAI and Anthropic providers.

Comment on lines +948 to 950
judgeval_logger.warning(
"The scorer provided is not hosted, skipping evaluation."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This warning is helpful, but it could be more actionable for the user. Consider suggesting which types of scorers are supported for async_evaluate (e.g., ExampleAPIScorerConfig or server-hosted scorers) so the user knows how to fix their code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants