test(llmobs): migrate tests/llmobs SDK tests to assert on LLMObsSpanData by Yun-Kim · Pull Request #17876 · DataDog/dd-trace-py

Yun-Kim · 2026-05-04T21:44:29Z

Summary

Extends the assert_llmobs_span_data migration (#17767, #17788, #17789, #17801, #17805) from contribs to the four "internal" SDK test files. Tests now read the canonical LLMObs payload from meta_struct[\"_llmobs\"] instead of the projected LLMObsSpanEvent captured by the writer mock — matching the SDK contract and surviving the agentless / agent-mode export-path split (cf. #17854) en route to APM/LLMObs convergence (MLOB-4925).

Claude session: 0d529421-0fef-4f8e-bc6a-ef406a8c9eb4
Resume: claude --resume 0d529421-0fef-4f8e-bc6a-ef406a8c9eb4

🤖 Generated with Claude Code

…fixture Preparation for migrating the four internal SDK test files (test_llmobs.py, test_llmobs_decorators.py, test_llmobs_service.py, test_propagation.py) onto assert_llmobs_span_data — same pattern as the contrib migrations in #17767, #17788, #17789, #17801, #17805. Setting the env var inside the fixture is benign for tests that still assert on llmobs_events: it only skips the post-enqueue meta_struct scrub in _on_span_finish, so the writer continues to receive projected LLMObsSpanEvents unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Switch the 5 tests that read llmobs_events to read spans directly via test_spans + _get_llmobs_data_metastruct. The remaining tests already assert on span.context._meta or use subprocess-driven distributed header tests and don't go through the projection step. The wire-only _dd.apm_trace_id field is replaced by format_trace_id(span.trace_id), which is its definition (see ddtrace/llmobs/_llmobs.py:633), and span_id is read off the Span directly. parent_id and the LLMObs trace_id come straight from meta_struct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…anData All 51 test items in this file go through the same _expected_llmobs_* projection helpers — those helpers reconstruct what the wire event should look like rather than validate the projection step itself, so asserting on the canonical LLMObsSpanData at meta_struct["_llmobs"] is equivalent for every assertion the tests make. Mechanical conversions: - The 2nd positional arg of _expected_llmobs_* (e.g. "llm", "agent") becomes span_kind=... - token_metrics= renamed to metrics= - is_decorator=True folded into tags={"decorator": "1"} (subset match) - session_id="..." folded into tags={"session_id": "..."} - error=, error_message=, error_stack= triple folded into a single error={"type": ..., "message": ..., "stack": ...} dict, with the fields read off span.get_tag("error.type"|"error.message"|"error.stack") for the cases that didn't pin them explicitly Two unparseable-output tests stayed on a direct meta_struct assertion (meta.output == {}) since assert_llmobs_span_data skips checks when output_messages= is None, and the test specifically verifies absence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

60 tests migrate to assert_llmobs_span_data on the canonical payload at meta_struct["_llmobs"], following the same pattern as test_propagation.py and test_llmobs_decorators.py. Two tests intentionally stay on llmobs_events: - test_malformed_span_logs_error_instead_of_raising — the malformed span never reaches _prepare_llmobs_span_data, so wire-event count is the cleanest signal that no event was emitted. - TestLLMIOProcessing.test_processor_omit_span — span_processor suppression is a wire-level side effect (the writer is skipped); the meta_struct payload is still populated. The omit semantics are about what reaches the wire, not what's on the canonical payload. Tests that read APM trace IDs (formerly span_event["_dd"]["apm_trace_id"]) now read format_trace_id(span.trace_id) directly off the underlying APM span, matching the test_propagation.py migration. apm_trace_id is projection-only — it's computed at _llmobs_span_event time, not stored on meta_struct. Tests that previously took only `tracer` as a fixture and pulled `llmobs` in transitively via `llmobs_events` now take `llmobs` directly. Without the LLMObs fixture enabled, _annotate_llmobs_span_data wouldn't reach the meta_struct path. Cleaned up unused `llmobs_events` fixture parameters from a handful of tests that asserted only on side-effects (LLMOBS_SUBMITTED_TAG_KEY tag, local-root trace_id propagation, APM trace dropping). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Migrate the ~20 tests that asserted on llmobs_events / _expected_llmobs_* to assert_llmobs_span_data on the canonical meta_struct["_llmobs"] payload. Test classes covered: span creation (test_llm_span, test_task_span, etc.), session_id, error handling, ml_app override, annotation-context trace structure, LLMObs↔APM parenting, and TestBuildSpanEventFromMetaStructE2E. Tests intentionally kept on llmobs_events / writer mocks: - All 11 eval-metric-writer tests (test_submit_evaluation_*) — they assert on mock_llmobs_eval_metric_writer.enqueue.assert_called_with() and use _expected_llmobs_eval_metric_event(); these test the eval metric submission API, not the span payload. - test_listener_hooks_enqueue_correct_writer (subprocess writer-routing) - test_llmobs_with_evaluation_runner_does_not_enqueue_non_llm_spans (mocks evaluator_runner.enqueue) - 9 fork tests (subprocess-based, separate assertion style) - test_llmobs_with_evaluator_runner Non-mechanical adaptations (consistent with the test_llmobs.py and test_propagation.py migrations): - span_id is added at projection time (not stored on meta_struct), so span-id assertions read str(span.span_id) directly off the Span. - apm_trace_id is projection-only; reads use format_trace_id(span.trace_id). - status is computed from span.error at projection time; replaced event["status"] == "ok"/"error" with assert span.error == 0/1. - Multi-span tests bind each span to a distinct `as` variable so the per-span assert_llmobs_span_data call has a definite Span reference; previously the tests indexed into the wire event list by ordinal. _expected_llmobs_eval_metric_event remains imported and is still used by the retained eval-metric-writer tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6d13692302

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

cit-pr-commenter-54b7da · 2026-05-04T21:51:36Z

Codeowners resolved as

ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
tests/llmobs/conftest.py                                                @DataDog/ml-observability
tests/llmobs/test_llmobs.py                                             @DataDog/ml-observability
tests/llmobs/test_llmobs_decorators.py                                  @DataDog/ml-observability
tests/llmobs/test_llmobs_service.py                                     @DataDog/ml-observability
tests/llmobs/test_propagation.py                                        @DataDog/ml-observability

…bscripts Address PR review feedback: prefer the typed accessors in ddtrace.llmobs._utils over reaching into the LLMObsSpanData dict at test sites. Same coverage; reads now go through the same API the SDK itself uses. Substitutions: ["trace_id"] -> get_llmobs_trace_id(span) ["parent_id"] -> get_llmobs_parent_id(span) ["name"] -> get_llmobs_span_name(span) ["session_id"] -> get_llmobs_session_id(span) ["tags"]["ml_app"] -> get_llmobs_ml_app(span) ["meta"]["input"] -> get_llmobs_input(span) ["meta"]["output"] -> get_llmobs_output(span) ["meta"]["input"]["messages"] -> get_llmobs_input_messages(span) ["meta"]["input"]["value"] -> get_llmobs_input_value(span) ["meta"]["input"]["prompt"] -> get_llmobs_input_prompt(span) ["meta"]["model_name"] -> get_llmobs_model_name(span) ["meta"]["model_provider"]-> get_llmobs_model_provider(span) ["meta"]["metadata"]["_dd"]["cost_tags"] -> get_llmobs_cost_tags(span) Filter expressions [s for ... if _get_llmobs_data_metastruct(s)] are rewritten as ... if get_llmobs_span_kind(s)] — equivalent (a span with LLMObsSpanData on meta_struct always has a span_kind set) and uses the typed helper. _get_llmobs_data_metastruct still appears in two places where there is no helper: feeding the canonical payload to assert_llmobs_span_data (intentional — that's the helper's input contract), and a single meta.error read in test_error_span (no error helper exists). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The three TestMLApp tests in test_llmobs.py originally asserted that "ml_app:..." appeared in the projected event's tag list. The previous helper sweep switched them to get_llmobs_ml_app(span), which reads the top-level meta_struct["_llmobs"]["ml_app"] field — a different attribute than the tag the tests are named for. Both fields get set, but the test names ("test_tag_defaults_to_env_var", "test_tag_overrides_env_var") make clear the tag form is what's under test. Switch back to get_llmobs_tags(span)["ml_app"], which reads the tag dict on meta_struct (the canonical form of what becomes "ml_app:..." in the projected event tag list). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Consolidate the meta_struct scrub in _on_span_finish to a single point that fires whenever no event will reach the LLMObs backend — covering all four "won't export" paths: 1. _prepare_llmobs_span_data returns False because meta_struct is empty 2. _prepare_llmobs_span_data returns False because span_kind is missing (malformed span) 3. _prepare_llmobs_span_data returns False because the user-registered span_processor returned None (explicit drop) 4. _llmobs_span_event raises and the exception is caught Previously only the kept path scrubbed (after writer.enqueue), so any of the above left the LLMObs payload in place on the underlying APM span. That's a no-op today (the LLMObs writer never enqueued, and APM doesn't read meta_struct["_llmobs"] yet), but it becomes a leak post-convergence (MLOB-4925): once APM ships meta_struct["_llmobs"] to the LLMObs backend, an "omitted/malformed/errored" span whose payload was left intact would arrive at LLMObs anyway — exactly the data the SDK said wouldn't be exported. The new scrub is unconditional (overrides _DD_LLMOBS_TEST_KEEP_META_STRUCT), because the contract for these paths is "no LLMObs data on this span" regardless of test mode. The existing kept-path scrub at line 539-540 keeps its existing semantics — that's the only place the env var suppresses the scrub. Test changes: - test_processor_omit_span: rewritten to assert the contract directly (omit ⇒ _get_llmobs_data_metastruct(span) == {}, kept ⇒ payload intact). - test_malformed_span_logs_error_instead_of_raising: same shape — drops llmobs_events and asserts empty meta_struct on the malformed span. Both rewrites remove the last llmobs_events references in tests/llmobs/test_llmobs.py and the four migrated SDK test files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…lpers Two cleanups in test_llmobs.py and test_llmobs_service.py: 1. Drop the test_spans fixture parameter from 12 tests that captured spans directly via `with llmobs.X() as span:` and never call test_spans.pop()/pop_traces(). The fixture was a holdover from the contrib pattern (which extracts spans by trace iteration); these tests have direct references and don't need it. 2. Replace assert_llmobs_span_data calls in 11 single-attribute tests with the corresponding get_llmobs_*(span) helper. Each test was originally a singular I/O assertion (input_value, input_messages, output_messages, output_value, metadata, metrics, span_name, span_kind, model_name, model_provider) — assert_llmobs_span_data is overkill for one-attribute checks. Drops the assert_llmobs_span_data import from this file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…p_level_field The test's whole subject is "session_id is recoverable from a span set via the session_id kwarg" — a single-attribute check. Replace assert_llmobs_span_data(... tags={"session_id": ...}) with get_llmobs_session_id(span) == session_id, which both reads more directly and matches the test name (top-level field, not the tag form). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… tests Eight span-creation tests in test_llmobs_service.py (test_llm_span, test_llm_span_no_model_sets_default, test_tool_span, test_task_span, test_workflow_span, test_agent_span, test_embedding_span, test_embedding_span_no_model_sets_default) followed a redundant pattern: with llmobs.X(...) as span: assert span.name == ... # APM span field assert get_llmobs_span_kind(span) == ... # LLMObs field (inside) ... assert_llmobs_span_data(..., span_kind=..., ...) # SAME LLMObs fields, AGAIN The post-finish assert_llmobs_span_data fully duplicates the inside-block get_llmobs_* checks (these fields are set at span start by _activate_llmobs_span and aren't mutated by _normalize_llmobs_meta on finish). Move the get_llmobs_* checks out of the with block (so they run on the post-finish span) and drop the assert_llmobs_span_data calls. APM span.name/resource/span_type checks stay inside the block — those are independent. For test_llm_span_no_model_sets_default and test_embedding_span_no_model_sets_default, the inside-block originally only checked model_name; the after-block added span_kind and model_provider as well. After the move, all three fields are checked post-finish via single-attribute helpers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ruff check --fix is a no-op (passes). ruff format collapses 5 multi-line assert_llmobs_span_data calls in test_ml_app_override into single lines that fit the configured line length. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

datadog-prod-us1-3 · 2026-05-06T17:13:18Z

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 7956e93 | Docs | Datadog PR Page | Give us feedback!}

Yun-Kim · 2026-05-06T19:19:55Z

/merge

gh-worker-devflow-routing-ef8351 · 2026-05-06T19:20:00Z

View all feedbacks in Devflow UI.

2026-05-06 19:19:59 UTC ℹ️ Start processing command /merge

2026-05-06 19:20:09 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

2026-05-06 19:24:14 UTC ℹ️ MergeQueue: merge request added to the queue

The expected merge time in main is approximately 53m (p90).

2026-05-06 20:01:31 UTC ℹ️ MergeQueue: This merge request was merged

…ata (#17876) ## Summary Extends the assert_llmobs_span_data migration (#17767, #17788, #17789, #17801, #17805) from contribs to the four "internal" SDK test files. Tests now read the canonical LLMObs payload from `meta_struct[\"_llmobs\"]` instead of the projected `LLMObsSpanEvent` captured by the writer mock — matching the SDK contract and surviving the agentless / agent-mode export-path split (cf. #17854) en route to APM/LLMObs convergence ([MLOB-4925](https://datadoghq.atlassian.net/browse/MLOB-4925)). Claude session: `0d529421-0fef-4f8e-bc6a-ef406a8c9eb4` Resume: `claude --resume 0d529421-0fef-4f8e-bc6a-ef406a8c9eb4` 🤖 Generated with [Claude Code](https://claude.com/claude-code) [MLOB-4925]: https://datadoghq.atlassian.net/browse/MLOB-4925?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Co-authored-by: yun.kim <yun.kim@datadoghq.com>

Yun-Kim and others added 5 commits May 4, 2026 16:49

Yun-Kim requested a review from a team as a code owner May 4, 2026 21:44

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

Comment thread tests/llmobs/test_llmobs_service.py Outdated

Yun-Kim and others added 3 commits May 6, 2026 10:46

Yun-Kim force-pushed the yunkim/llmobs-test-meta-struct-internal branch from 71e53e0 to f6d1ad1 Compare May 6, 2026 15:42

Yun-Kim and others added 4 commits May 6, 2026 12:32

simplify comment

442fb66

Yun-Kim added the changelog/no-changelog A changelog entry is not required for this PR. label May 6, 2026

chore(llmobs): ruff format

236fa09

ruff check --fix is a no-op (passes). ruff format collapses 5 multi-line assert_llmobs_span_data calls in test_ml_app_override into single lines that fit the configured line length. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge branch 'main' into yunkim/llmobs-test-meta-struct-internal

7956e93

ZStriker19 approved these changes May 6, 2026

View reviewed changes

gh-worker-dd-mergequeue-cf854d Bot merged commit b8b617c into main May 6, 2026
559 checks passed

gh-worker-dd-mergequeue-cf854d Bot deleted the yunkim/llmobs-test-meta-struct-internal branch May 6, 2026 20:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(llmobs): migrate tests/llmobs SDK tests to assert on LLMObsSpanData#17876

test(llmobs): migrate tests/llmobs SDK tests to assert on LLMObsSpanData#17876
gh-worker-dd-mergequeue-cf854d[bot] merged 14 commits into
mainfrom
yunkim/llmobs-test-meta-struct-internal

Yun-Kim commented May 4, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

cit-pr-commenter-54b7da Bot commented May 4, 2026 •

edited

Loading

Uh oh!

datadog-prod-us1-3 Bot commented May 6, 2026 •

edited by datadog-datadog-prod-us1-2 Bot

Loading

Uh oh!

Yun-Kim commented May 6, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 Bot commented May 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yun-Kim commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

cit-pr-commenter-54b7da Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codeowners resolved as

Uh oh!

datadog-prod-us1-3 Bot commented May 6, 2026 • edited by datadog-datadog-prod-us1-2 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yun-Kim commented May 6, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Yun-Kim commented May 4, 2026 •

edited

Loading

cit-pr-commenter-54b7da Bot commented May 4, 2026 •

edited

Loading

datadog-prod-us1-3 Bot commented May 6, 2026 •

edited by datadog-datadog-prod-us1-2 Bot

Loading

gh-worker-devflow-routing-ef8351 Bot commented May 6, 2026 •

edited

Loading