-
Notifications
You must be signed in to change notification settings - Fork 87
Staging -> Main #603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Staging -> Main #603
Conversation
Co-authored-by: judgment-release-bot <219946635+judgment-release-bot@users.noreply.github.com> Co-authored-by: Aaryan Divate <44125685+adivate2021@users.noreply.github.com>
β¦596) * feat: immutable wrappers * testing concepts of openai wrapper * tests * openai * fix: error defaults on mutate args and kwargs * split stream wrappers * feat: anthropic wrapping * minor * test the tests * anthropic code formats * gemini * google and gemini * update lock * fix: wrapper * mypy * mypy * immutable wrapper where possible * responses fix * better safety access * remove groq e2e * fix together * chore: greptile * chore: iterator instead of generator * run one set of uts * add version matrix back * add os matrix back --------- Co-authored-by: Alan <alanzhang2021@gmail.com> Co-authored-by: Alan Zhang <97066812+alanzhang25@users.noreply.github.com> Co-authored-by: Justin Sheu <justinsheu3341@gmail.com>
* Organized UTs * fix flaky * gemini and anthropic keys --------- Co-authored-by: Justin Sheu <justinsheu3341@gmail.com>
* Cache Tokens Tests * randomize
* only return project id * update types
Summary of ChangesHello @abhishekg999, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request represents a substantial architectural improvement to the LLM client tracing infrastructure. By introducing a new set of generic wrapping utilities, the integration of various LLM providers becomes more modular, robust, and easier to maintain. Concurrently, the decision was made to remove Groq integration and the local evaluation queue, streamlining the project's scope and internal processes. These changes aim to provide a more stable and extensible foundation for future development. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with π and π on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This is a major staging-to-main merge that refactors the LLM tracer architecture and removes Groq provider support. The key changes include:
Core Architecture Changes:
- Complete removal of local evaluation queue functionality from the tracer, eliminating
LocalEvaluationQueueand related worker threads - Refactoring of LLM provider wrappers from monolithic files to modular structures (e.g., OpenAI wrapper split into
chat_completions.py,responses.py,beta_chat_completions.py) - Migration from centralized provider imports to direct imports in config modules
Groq Provider Removal:
- Systematic removal of all Groq-related code including wrapper, config, tests, and provider type enum entry
- Updates to CI workflow removing Groq client initialization and tests
New Wrapper Utilities System:
- Introduction of comprehensive wrapper utilities under
src/judgeval/utils/wrappers/including sync/async variants for both mutable and immutable operations - New wrapper functions:
immutable_wrap_sync,immutable_wrap_async,mutable_wrap_sync,mutable_wrap_async, and iterator variants - Enhanced error handling with
dont_throwandidentity_on_throwdecorators
LLM Provider Enhancements:
- Added support for new API endpoints (OpenAI's beta chat completions parse API, Anthropic streaming)
- Expanded CI support for Gemini and Anthropic API keys
- Comprehensive test coverage additions for all remaining providers
API Changes:
- Removal of
project_createdfield fromResolveProjectNameResponsein auto-generated API types - Pre-commit tool version updates (uv-pre-commit to 0.9.2, ruff to v0.14.0)
The refactoring follows a consistent pattern of modularizing complex wrappers while maintaining backward compatibility for public APIs.
Important Files Changed
Changed Files
| Filename | Score | Overview |
|---|---|---|
src/judgeval/tracer/__init__.py |
4/5 | Removed local evaluation queue functionality and simplified project resolution logic |
src/judgeval/tracer/local_eval_queue.py |
2/5 | Completely removed file containing LocalEvaluationQueue implementation (199 lines deleted) |
src/judgeval/tracer/llm/constants.py |
4/5 | Removed GROQ from ProviderType enum, breaking change for Groq usage |
src/judgeval/tracer/llm/config.py |
2/5 | Major refactor removing Groq support and changing provider detection to direct imports |
src/judgeval/tracer/llm/llm_groq/wrapper.py |
1/5 | Entire Groq wrapper implementation deleted (498 lines removed) |
src/judgeval/tracer/llm/llm_groq/config.py |
1/5 | Complete deletion of Groq configuration module |
src/judgeval/tracer/llm/llm_openai/wrapper.py |
4/5 | Major refactor from monolithic (661 lines) to modular design (63 lines) |
src/judgeval/tracer/llm/llm_anthropic/wrapper.py |
3/5 | Significant refactor with potential type checking issues from generic usage |
src/judgeval/tracer/llm/llm_google/wrapper.py |
4/5 | Simplified from 465 lines to 30 lines by extracting logic to separate modules |
src/judgeval/tracer/llm/llm_together/wrapper.py |
4/5 | Major refactor from ~500 lines to 52 lines with modular delegation |
src/judgeval/utils/wrappers/__init__.py |
5/5 | New package interface exposing6 wrapper functions with clean public API |
src/judgeval/utils/wrappers/immutable_wrap_sync.py |
5/5 | New synchronous wrapper with lifecycle hooks and error protection |
src/judgeval/utils/wrappers/mutable_wrap_sync.py |
5/5 | New mutable synchronous wrapper allowing argument/result modification |
src/judgeval/utils/decorators/dont_throw.py |
4/5 | Enhanced decorator with improved type safety and custom default values |
src/e2etests/test_tracer.py |
5/5 | Clean removal of all Groq client integrations while preserving other providers |
.github/workflows/ci.yaml |
4/5 | Added GEMINI_API_KEY and ANTHROPIC_API_KEY environment variables and parallel test execution |
Confidence score: 3/5
- This PR contains significant architectural changes that require careful review due to the removal of major functionality (Groq support, local evaluation queue)
- Score reflects the large scope of changes, potential breaking changes from Groq removal, and complex refactoring of core tracer components
- Multiple files show concerning patterns like unsafe attribute access and potential type checking issues that need attention
Sequence Diagram
sequenceDiagram
participant User
participant GitHubActions as "GitHub Actions"
participant CI as "CI Workflow"
participant BranchValidator as "Branch Validator"
participant TestRunner as "Test Runner"
participant AWSSecretsManager as "AWS Secrets Manager"
participant JudgmentServer as "Judgment Server"
User->>GitHubActions: "Push to staging branch"
GitHubActions->>CI: "Trigger CI workflow"
CI->>BranchValidator: "Validate merge branch"
BranchValidator-->>CI: "Validation result"
alt Branch validation success or skipped
par Unit Tests
CI->>TestRunner: "Run unit tests (Python 3.10-3.13, Ubuntu/macOS)"
TestRunner->>TestRunner: "Install dependencies with uv"
TestRunner->>TestRunner: "Run pytest tests"
TestRunner-->>CI: "Unit test results"
and E2E Tests (if main/staging branch)
CI->>TestRunner: "Configure AWS credentials"
CI->>TestRunner: "Set environment variables"
CI->>JudgmentServer: "Health check"
JudgmentServer-->>CI: "Health status"
alt Server is healthy
CI->>AWSSecretsManager: "Retrieve secrets"
AWSSecretsManager-->>CI: "API keys and secrets"
CI->>TestRunner: "Run E2E tests with secrets"
TestRunner->>TestRunner: "Execute pytest e2etests"
TestRunner-->>CI: "Coverage report"
CI->>GitHubActions: "Upload coverage artifacts"
else Server unhealthy
CI->>CI: "Fail job with error message"
end
end
else Branch validation failed
CI->>CI: "Skip tests"
end
CI-->>User: "CI results and artifacts"
Additional Comments (2)
-
src/judgeval/tracer/llm/llm_groq/wrapper.py, line 1 (link)logic: Complete removal of Groq wrapper - this breaks backward compatibility for users depending on Groq tracing. Verify this is intentional and that migration path exists.
-
src/judgeval/tracer/llm/llm_groq/config.py, line 1 (link)logic: Deleting this file will break imports that depend on
HAS_GROQ,groq_Groq, andgroq_AsyncGroq. Check that all references to these exports have been updated elsewhere in the codebase.
66 files reviewed, 33 comments
| if span: | ||
| span.record_exception(error) | ||
|
|
||
| wrapped = mutable_wrap_sync( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: using mutable_wrap_sync for async function - should be mutable_wrap_async
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/judgeval/tracer/llm/llm_anthropic/messages_stream.py
Line: 315:315
Comment:
**syntax:** using mutable_wrap_sync for async function - should be mutable_wrap_async
How can I resolve this? If you propose a fix, please make it concise.| from judgeval.tracer.llm.llm_openai.config import HAS_OPENAI | ||
| from judgeval.tracer.llm.llm_together.config import HAS_TOGETHER | ||
| from judgeval.tracer.llm.llm_anthropic.config import HAS_ANTHROPIC | ||
| from judgeval.tracer.llm.llm_google.config import HAS_GOOGLE_GENAI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Groq provider imports and exports were removed but HAS_GROQ flag is missing from the new imports
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/judgeval/tracer/llm/providers.py
Line: 7:7
Comment:
**logic:** Groq provider imports and exports were removed but HAS_GROQ flag is missing from the new imports
How can I resolve this? If you propose a fix, please make it concise.| span: The span to validate | ||
| attrs: Span attributes dictionary | ||
| expected_span_name: Expected span name (e.g., "ANTHROPIC_API_CALL") | ||
| expected_model_name: Expected model name (optional) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Docstring says 'optional' but the function signature shows expected_model_name as required parameter
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tests/tracer/llm/utils.py
Line: 28:28
Comment:
**style:** Docstring says 'optional' but the function signature shows expected_model_name as required parameter
How can I resolve this? If you propose a fix, please make it concise.| if span.events: | ||
| event_names = [event.name for event in span.events] | ||
| assert any("exception" in name.lower() for name in event_names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Exception validation will pass if span.events is empty - should this be more explicit about requiring exception events?
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tests/tracer/llm/utils.py
Line: 92:94
Comment:
**logic:** Exception validation will pass if span.events is empty - should this be more explicit about requiring exception events?
How can I resolve this? If you propose a fix, please make it concise.| if isinstance(client, Client): | ||
| wrap_generate_content_sync(tracer, client) | ||
| return client | ||
| else: | ||
| raise TypeError(f"Invalid client type: {type(client)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Missing async client support. The old implementation handled both sync (google_genai_Client) and async (google_genai_AsyncClient) clients, but the new implementation only handles sync clients via wrap_generate_content_sync.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/judgeval/tracer/llm/llm_google/wrapper.py
Line: 26:30
Comment:
**logic:** Missing async client support. The old implementation handled both sync (`google_genai_Client`) and async (`google_genai_AsyncClient`) clients, but the new implementation only handles sync clients via `wrap_generate_content_sync`.
How can I resolve this? If you propose a fix, please make it concise.| def __init__(self): | ||
| self.started_spans = [] | ||
| self.ended_spans = [] | ||
| self.resource_attributes = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: resource_attributes is defined but never used in the class
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tests/tracer/llm/conftest.py
Line: 15:15
Comment:
**style:** `resource_attributes` is defined but never used in the class
How can I resolve this? If you propose a fix, please make it concise.| # Set up minimal TracerProvider with mock processor | ||
| provider = TracerProvider() | ||
| provider.add_span_processor(mock_processor) | ||
| set_tracer_provider(provider) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Setting global tracer provider in tests can cause side effects between tests - consider using a context manager or cleanup
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tests/tracer/llm/conftest.py
Line: 67:67
Comment:
**style:** Setting global tracer provider in tests can cause side effects between tests - consider using a context manager or cleanup
How can I resolve this? If you propose a fix, please make it concise.| @pytest.fixture | ||
| def tracer_with_mock(tracer): | ||
| """Alias for tracer - both now use the mock processor""" | ||
| return tracer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: tracer_with_mock fixture is redundant - it just returns the same tracer fixture
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tests/tracer/llm/conftest.py
Line: 77:80
Comment:
**style:** `tracer_with_mock` fixture is redundant - it just returns the same `tracer` fixture
How can I resolve this? If you propose a fix, please make it concise.| if usage_data: | ||
| prompt_tokens = usage_data.input_tokens or 0 | ||
| completion_tokens = usage_data.output_tokens or 0 | ||
| cache_read = usage_data.input_tokens_details.cached_tokens or 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Direct attribute access could cause AttributeError if input_tokens_details is None. Consider using safe access pattern like the streaming implementation.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/judgeval/tracer/llm/llm_openai/responses.py
Line: 81:81
Comment:
**logic:** Direct attribute access could cause AttributeError if `input_tokens_details` is None. Consider using safe access pattern like the streaming implementation.
How can I resolve this? If you propose a fix, please make it concise.| if usage_data: | ||
| prompt_tokens = usage_data.input_tokens or 0 | ||
| completion_tokens = usage_data.output_tokens or 0 | ||
| cache_read = usage_data.input_tokens_details.cached_tokens or 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Same unsafe attribute access issue as line 81 - should use getattr() with default value or null checking.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/judgeval/tracer/llm/llm_openai/responses.py
Line: 285:285
Comment:
**logic:** Same unsafe attribute access issue as line 81 - should use getattr() with default value or null checking.
How can I resolve this? If you propose a fix, please make it concise.| @@ -0,0 +1,3 @@ | |||
| # Wrapper Utilities | |||
|
|
|||
| Ensure 100% test coverage for all files in this folder | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Documentation]
Add a period at the end of the sentence for proper punctuation.
Context for Agents
[**Documentation**]
Add a period at the end of the sentence for proper punctuation.
File: src/judgeval/utils/wrappers/README.md
Line: 3There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant and impressive refactoring of the LLM provider wrapping logic, moving from monolithic wrappers to a much more modular and maintainable structure using new wrapper utilities. This is a great improvement for the codebase. Key changes include the removal of support for Groq and the local evaluation queue, which are major breaking changes that should be clearly communicated to users. The wait_for_completion method has also been removed from the Tracer class. While these removals seem intentional and are applied consistently, the PR also appears to have unintentionally dropped support for Google's async client, which I've flagged as a high-severity regression. Overall, this is a very positive structural change, accompanied by an excellent suite of new tests.
| if HAS_GOOGLE_GENAI: | ||
| from judgeval.tracer.llm.providers import ( | ||
| google_genai_Client, | ||
| google_genai_AsyncClient, | ||
| ) | ||
|
|
||
| assert google_genai_Client is not None, "Google GenAI client not found" | ||
| assert google_genai_AsyncClient is not None, ( | ||
| "Google GenAI async client not found" | ||
| ) | ||
| if isinstance(client, (google_genai_Client, google_genai_AsyncClient)): | ||
| return ProviderType.GOOGLE | ||
| from google.genai import Client as GoogleClient | ||
|
|
||
| if HAS_GROQ: | ||
| from judgeval.tracer.llm.providers import groq_Groq, groq_AsyncGroq | ||
|
|
||
| assert groq_Groq is not None, "Groq client not found" | ||
| assert groq_AsyncGroq is not None, "Groq async client not found" | ||
| if isinstance(client, (groq_Groq, groq_AsyncGroq)): | ||
| return ProviderType.GROQ | ||
| if isinstance(client, GoogleClient): | ||
| return ProviderType.GOOGLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This provider detection logic for Google GenAI seems to have dropped support for the AsyncClient. The previous implementation checked for both google.genai.Client and google.genai.client.AsyncClient. This change means async Google clients will no longer be correctly identified and wrapped, which is a regression. Was the removal of async support for Google GenAI intentional? If not, the check should be updated to include AsyncClient and the corresponding async wrapper logic should be implemented.
| def wrap_google_client(tracer: Tracer, client: Client) -> Client: | ||
| from judgeval.tracer.llm.llm_google.config import HAS_GOOGLE_GENAI | ||
| from judgeval.logger import judgeval_logger | ||
|
|
||
| def _format_google_output( | ||
| response: GoogleGenerateContentResponse, | ||
| ) -> Tuple[Optional[str], Optional[GoogleUsageMetadata]]: | ||
| message_content: Optional[str] = None | ||
| usage_data: Optional[GoogleUsageMetadata] = None | ||
|
|
||
| try: | ||
| if isinstance(response, GoogleGenerateContentResponse): | ||
| usage_data = response.usage_metadata | ||
| if response.candidates and len(response.candidates) > 0: | ||
| candidate = response.candidates[0] | ||
| if ( | ||
| candidate.content | ||
| and candidate.content.parts | ||
| and len(candidate.content.parts) > 0 | ||
| ): | ||
| message_content = candidate.content.parts[0].text | ||
| except (AttributeError, IndexError, TypeError): | ||
| pass | ||
|
|
||
| return message_content, usage_data | ||
|
|
||
|
|
||
| class TracedGoogleGenerator: | ||
| def __init__( | ||
| self, | ||
| tracer: Tracer, | ||
| generator: Iterator[GoogleStreamChunk], | ||
| client: GoogleClientType, | ||
| span: Span, | ||
| model_name: str, | ||
| ): | ||
| self.tracer = tracer | ||
| self.generator = generator | ||
| self.client = client | ||
| self.span = span | ||
| self.model_name = model_name | ||
| self.accumulated_content = "" | ||
|
|
||
| def __iter__(self) -> Iterator[GoogleStreamChunk]: | ||
| return self | ||
|
|
||
| def __next__(self) -> GoogleStreamChunk: | ||
| try: | ||
| chunk = next(self.generator) | ||
| content = _extract_google_content(chunk) | ||
| if content: | ||
| self.accumulated_content += content | ||
| if chunk.usage_metadata: | ||
| prompt_tokens, completion_tokens, cache_read, cache_creation = ( | ||
| _extract_google_tokens(chunk.usage_metadata) | ||
| ) | ||
| set_span_attribute( | ||
| self.span, AttributeKeys.GEN_AI_USAGE_INPUT_TOKENS, prompt_tokens | ||
| ) | ||
| set_span_attribute( | ||
| self.span, | ||
| AttributeKeys.GEN_AI_USAGE_OUTPUT_TOKENS, | ||
| completion_tokens, | ||
| ) | ||
| set_span_attribute( | ||
| self.span, | ||
| AttributeKeys.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, | ||
| cache_read, | ||
| ) | ||
| set_span_attribute( | ||
| self.span, | ||
| AttributeKeys.JUDGMENT_USAGE_METADATA, | ||
| safe_serialize(chunk.usage_metadata), | ||
| ) | ||
| return chunk | ||
| except StopIteration: | ||
| set_span_attribute( | ||
| self.span, AttributeKeys.GEN_AI_COMPLETION, self.accumulated_content | ||
| ) | ||
| self.span.end() | ||
| raise | ||
| except Exception as e: | ||
| if self.span: | ||
| self.span.record_exception(e) | ||
| self.span.end() | ||
| raise | ||
|
|
||
|
|
||
| class TracedGoogleAsyncGenerator: | ||
| def __init__( | ||
| self, | ||
| tracer: Tracer, | ||
| async_generator: AsyncIterator[GoogleStreamChunk], | ||
| client: GoogleClientType, | ||
| span: Span, | ||
| model_name: str, | ||
| ): | ||
| self.tracer = tracer | ||
| self.async_generator = async_generator | ||
| self.client = client | ||
| self.span = span | ||
| self.model_name = model_name | ||
| self.accumulated_content = "" | ||
|
|
||
| def __aiter__(self) -> AsyncIterator[GoogleStreamChunk]: | ||
| return self | ||
|
|
||
| async def __anext__(self) -> GoogleStreamChunk: | ||
| try: | ||
| chunk = await self.async_generator.__anext__() | ||
| content = _extract_google_content(chunk) | ||
| if content: | ||
| self.accumulated_content += content | ||
| if chunk.usage_metadata: | ||
| prompt_tokens, completion_tokens, cache_read, cache_creation = ( | ||
| _extract_google_tokens(chunk.usage_metadata) | ||
| ) | ||
| set_span_attribute( | ||
| self.span, AttributeKeys.GEN_AI_USAGE_INPUT_TOKENS, prompt_tokens | ||
| ) | ||
| set_span_attribute( | ||
| self.span, | ||
| AttributeKeys.GEN_AI_USAGE_OUTPUT_TOKENS, | ||
| completion_tokens, | ||
| ) | ||
| set_span_attribute( | ||
| self.span, | ||
| AttributeKeys.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, | ||
| cache_read, | ||
| ) | ||
| set_span_attribute( | ||
| self.span, | ||
| AttributeKeys.JUDGMENT_USAGE_METADATA, | ||
| safe_serialize(chunk.usage_metadata), | ||
| ) | ||
| return chunk | ||
| except StopAsyncIteration: | ||
| set_span_attribute( | ||
| self.span, AttributeKeys.GEN_AI_COMPLETION, self.accumulated_content | ||
| ) | ||
| self.span.end() | ||
| raise | ||
| except Exception as e: | ||
| if self.span: | ||
| self.span.record_exception(e) | ||
| self.span.end() | ||
| raise | ||
|
|
||
|
|
||
| def wrap_google_client(tracer: Tracer, client: GoogleClientType) -> GoogleClientType: | ||
| def wrapped(function: Callable, span_name: str): | ||
| @functools.wraps(function) | ||
| def wrapper(*args, **kwargs): | ||
| if kwargs.get("stream", False): | ||
| try: | ||
| span = tracer.get_tracer().start_span( | ||
| span_name, attributes={AttributeKeys.JUDGMENT_SPAN_KIND: "llm"} | ||
| ) | ||
| tracer.add_agent_attributes_to_span(span) | ||
| set_span_attribute( | ||
| span, AttributeKeys.GEN_AI_PROMPT, safe_serialize(kwargs) | ||
| ) | ||
| model_name = kwargs.get("model", "") | ||
| set_span_attribute( | ||
| span, AttributeKeys.GEN_AI_REQUEST_MODEL, model_name | ||
| ) | ||
| except Exception as e: | ||
| judgeval_logger.error( | ||
| f"[google wrapped] Error adding span metadata: {e}" | ||
| ) | ||
| stream_response = function(*args, **kwargs) | ||
| return TracedGoogleGenerator( | ||
| tracer, stream_response, client, span, model_name | ||
| ) | ||
| else: | ||
| with sync_span_context( | ||
| tracer, span_name, {AttributeKeys.JUDGMENT_SPAN_KIND: "llm"} | ||
| ) as span: | ||
| try: | ||
| tracer.add_agent_attributes_to_span(span) | ||
| set_span_attribute( | ||
| span, AttributeKeys.GEN_AI_PROMPT, safe_serialize(kwargs) | ||
| ) | ||
| model_name = kwargs.get("model", "") | ||
| set_span_attribute( | ||
| span, AttributeKeys.GEN_AI_REQUEST_MODEL, model_name | ||
| ) | ||
| except Exception as e: | ||
| judgeval_logger.error( | ||
| f"[google wrapped] Error adding span metadata: {e}" | ||
| ) | ||
|
|
||
| response = function(*args, **kwargs) | ||
|
|
||
| try: | ||
| if isinstance(response, GoogleGenerateContentResponse): | ||
| output, usage_data = _format_google_output(response) | ||
| set_span_attribute( | ||
| span, AttributeKeys.GEN_AI_COMPLETION, output | ||
| ) | ||
| if usage_data: | ||
| ( | ||
| prompt_tokens, | ||
| completion_tokens, | ||
| cache_read, | ||
| cache_creation, | ||
| ) = _extract_google_tokens(usage_data) | ||
| set_span_attribute( | ||
| span, | ||
| AttributeKeys.GEN_AI_USAGE_INPUT_TOKENS, | ||
| prompt_tokens, | ||
| ) | ||
| set_span_attribute( | ||
| span, | ||
| AttributeKeys.GEN_AI_USAGE_OUTPUT_TOKENS, | ||
| completion_tokens, | ||
| ) | ||
| set_span_attribute( | ||
| span, | ||
| AttributeKeys.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, | ||
| cache_read, | ||
| ) | ||
| set_span_attribute( | ||
| span, | ||
| AttributeKeys.JUDGMENT_USAGE_METADATA, | ||
| safe_serialize(usage_data), | ||
| ) | ||
| set_span_attribute( | ||
| span, | ||
| AttributeKeys.GEN_AI_RESPONSE_MODEL, | ||
| getattr(response, "model_version", model_name), | ||
| ) | ||
| except Exception as e: | ||
| judgeval_logger.error( | ||
| f"[google wrapped] Error adding span metadata: {e}" | ||
| ) | ||
| finally: | ||
| return response | ||
|
|
||
| return wrapper | ||
|
|
||
| def wrapped_async(function: Callable, span_name: str): | ||
| @functools.wraps(function) | ||
| async def wrapper(*args, **kwargs): | ||
| if kwargs.get("stream", False): | ||
| try: | ||
| span = tracer.get_tracer().start_span( | ||
| span_name, attributes={AttributeKeys.JUDGMENT_SPAN_KIND: "llm"} | ||
| ) | ||
| tracer.add_agent_attributes_to_span(span) | ||
| set_span_attribute( | ||
| span, AttributeKeys.GEN_AI_PROMPT, safe_serialize(kwargs) | ||
| ) | ||
| model_name = kwargs.get("model", "") | ||
| set_span_attribute( | ||
| span, AttributeKeys.GEN_AI_REQUEST_MODEL, model_name | ||
| ) | ||
| except Exception as e: | ||
| judgeval_logger.error( | ||
| f"[google wrapped_async] Error adding span metadata: {e}" | ||
| ) | ||
| stream_response = await function(*args, **kwargs) | ||
| return TracedGoogleAsyncGenerator( | ||
| tracer, stream_response, client, span, model_name | ||
| ) | ||
| else: | ||
| async with async_span_context( | ||
| tracer, span_name, {AttributeKeys.JUDGMENT_SPAN_KIND: "llm"} | ||
| ) as span: | ||
| try: | ||
| tracer.add_agent_attributes_to_span(span) | ||
| set_span_attribute( | ||
| span, AttributeKeys.GEN_AI_PROMPT, safe_serialize(kwargs) | ||
| ) | ||
| model_name = kwargs.get("model", "") | ||
| set_span_attribute( | ||
| span, AttributeKeys.GEN_AI_REQUEST_MODEL, model_name | ||
| ) | ||
| except Exception as e: | ||
| judgeval_logger.error( | ||
| f"[google wrapped_async] Error adding span metadata: {e}" | ||
| ) | ||
|
|
||
| response = await function(*args, **kwargs) | ||
|
|
||
| try: | ||
| if isinstance(response, GoogleGenerateContentResponse): | ||
| output, usage_data = _format_google_output(response) | ||
| set_span_attribute( | ||
| span, AttributeKeys.GEN_AI_COMPLETION, output | ||
| ) | ||
| if usage_data: | ||
| ( | ||
| prompt_tokens, | ||
| completion_tokens, | ||
| cache_read, | ||
| cache_creation, | ||
| ) = _extract_google_tokens(usage_data) | ||
| set_span_attribute( | ||
| span, | ||
| AttributeKeys.GEN_AI_USAGE_INPUT_TOKENS, | ||
| prompt_tokens, | ||
| ) | ||
| set_span_attribute( | ||
| span, | ||
| AttributeKeys.GEN_AI_USAGE_OUTPUT_TOKENS, | ||
| completion_tokens, | ||
| ) | ||
| set_span_attribute( | ||
| span, | ||
| AttributeKeys.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, | ||
| cache_read, | ||
| ) | ||
| set_span_attribute( | ||
| span, | ||
| AttributeKeys.JUDGMENT_USAGE_METADATA, | ||
| safe_serialize(usage_data), | ||
| ) | ||
| set_span_attribute( | ||
| span, | ||
| AttributeKeys.GEN_AI_RESPONSE_MODEL, | ||
| getattr(response, "model_version", model_name), | ||
| ) | ||
| except Exception as e: | ||
| judgeval_logger.error( | ||
| f"[google wrapped_async] Error adding span metadata: {e}" | ||
| ) | ||
| finally: | ||
| return response | ||
|
|
||
| return wrapper | ||
|
|
||
| span_name = "GOOGLE_API_CALL" | ||
| if google_genai_Client is not None and isinstance(client, google_genai_Client): | ||
| # Type narrowing for mypy | ||
| google_client = client # type: ignore[assignment] | ||
| setattr( | ||
| google_client.models, | ||
| "generate_content", | ||
| wrapped(google_client.models.generate_content, span_name), | ||
| ) | ||
| elif google_genai_AsyncClient is not None and isinstance( | ||
| client, google_genai_AsyncClient | ||
| ): | ||
| # Type narrowing for mypy | ||
| async_google_client = client # type: ignore[assignment] | ||
| setattr( | ||
| async_google_client.models, | ||
| "generate_content", | ||
| wrapped_async(async_google_client.models.generate_content, span_name), | ||
| if not HAS_GOOGLE_GENAI: | ||
| judgeval_logger.error( | ||
| "Cannot wrap Google GenAI client: 'google-genai' library not installed. " | ||
| "Install it with: pip install google-genai" | ||
| ) | ||
| return client | ||
|
|
||
| from google.genai import Client | ||
|
|
||
| return client | ||
| if isinstance(client, Client): | ||
| wrap_generate_content_sync(tracer, client) | ||
| return client | ||
| else: | ||
| raise TypeError(f"Invalid client type: {type(client)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wrapper only handles the synchronous google.genai.Client. The previous implementation also supported the asynchronous AsyncClient. Removing this support is a significant breaking change. If this was unintentional, please add back support for AsyncClient by creating an async version of the wrapper, similar to how it's handled for OpenAI and Anthropic providers.
| judgeval_logger.warning( | ||
| "The scorer provided is not hosted, skipping evaluation." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π Summary
β Checklist