test(llmobs): migrate tests/llmobs SDK tests to assert on LLMObsSpanData#17876
Conversation
…fixture Preparation for migrating the four internal SDK test files (test_llmobs.py, test_llmobs_decorators.py, test_llmobs_service.py, test_propagation.py) onto assert_llmobs_span_data — same pattern as the contrib migrations in #17767, #17788, #17789, #17801, #17805. Setting the env var inside the fixture is benign for tests that still assert on llmobs_events: it only skips the post-enqueue meta_struct scrub in _on_span_finish, so the writer continues to receive projected LLMObsSpanEvents unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch the 5 tests that read llmobs_events to read spans directly via test_spans + _get_llmobs_data_metastruct. The remaining tests already assert on span.context._meta or use subprocess-driven distributed header tests and don't go through the projection step. The wire-only _dd.apm_trace_id field is replaced by format_trace_id(span.trace_id), which is its definition (see ddtrace/llmobs/_llmobs.py:633), and span_id is read off the Span directly. parent_id and the LLMObs trace_id come straight from meta_struct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…anData
All 51 test items in this file go through the same _expected_llmobs_*
projection helpers — those helpers reconstruct what the wire event
should look like rather than validate the projection step itself, so
asserting on the canonical LLMObsSpanData at meta_struct["_llmobs"] is
equivalent for every assertion the tests make.
Mechanical conversions:
- The 2nd positional arg of _expected_llmobs_* (e.g. "llm", "agent")
becomes span_kind=...
- token_metrics= renamed to metrics=
- is_decorator=True folded into tags={"decorator": "1"} (subset match)
- session_id="..." folded into tags={"session_id": "..."}
- error=, error_message=, error_stack= triple folded into a single
error={"type": ..., "message": ..., "stack": ...} dict, with the
fields read off span.get_tag("error.type"|"error.message"|"error.stack")
for the cases that didn't pin them explicitly
Two unparseable-output tests stayed on a direct meta_struct assertion
(meta.output == {}) since assert_llmobs_span_data skips checks when
output_messages= is None, and the test specifically verifies absence.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
60 tests migrate to assert_llmobs_span_data on the canonical payload at meta_struct["_llmobs"], following the same pattern as test_propagation.py and test_llmobs_decorators.py. Two tests intentionally stay on llmobs_events: - test_malformed_span_logs_error_instead_of_raising — the malformed span never reaches _prepare_llmobs_span_data, so wire-event count is the cleanest signal that no event was emitted. - TestLLMIOProcessing.test_processor_omit_span — span_processor suppression is a wire-level side effect (the writer is skipped); the meta_struct payload is still populated. The omit semantics are about what reaches the wire, not what's on the canonical payload. Tests that read APM trace IDs (formerly span_event["_dd"]["apm_trace_id"]) now read format_trace_id(span.trace_id) directly off the underlying APM span, matching the test_propagation.py migration. apm_trace_id is projection-only — it's computed at _llmobs_span_event time, not stored on meta_struct. Tests that previously took only `tracer` as a fixture and pulled `llmobs` in transitively via `llmobs_events` now take `llmobs` directly. Without the LLMObs fixture enabled, _annotate_llmobs_span_data wouldn't reach the meta_struct path. Cleaned up unused `llmobs_events` fixture parameters from a handful of tests that asserted only on side-effects (LLMOBS_SUBMITTED_TAG_KEY tag, local-root trace_id propagation, APM trace dropping). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Migrate the ~20 tests that asserted on llmobs_events / _expected_llmobs_* to assert_llmobs_span_data on the canonical meta_struct["_llmobs"] payload. Test classes covered: span creation (test_llm_span, test_task_span, etc.), session_id, error handling, ml_app override, annotation-context trace structure, LLMObs↔APM parenting, and TestBuildSpanEventFromMetaStructE2E. Tests intentionally kept on llmobs_events / writer mocks: - All 11 eval-metric-writer tests (test_submit_evaluation_*) — they assert on mock_llmobs_eval_metric_writer.enqueue.assert_called_with() and use _expected_llmobs_eval_metric_event(); these test the eval metric submission API, not the span payload. - test_listener_hooks_enqueue_correct_writer (subprocess writer-routing) - test_llmobs_with_evaluation_runner_does_not_enqueue_non_llm_spans (mocks evaluator_runner.enqueue) - 9 fork tests (subprocess-based, separate assertion style) - test_llmobs_with_evaluator_runner Non-mechanical adaptations (consistent with the test_llmobs.py and test_propagation.py migrations): - span_id is added at projection time (not stored on meta_struct), so span-id assertions read str(span.span_id) directly off the Span. - apm_trace_id is projection-only; reads use format_trace_id(span.trace_id). - status is computed from span.error at projection time; replaced event["status"] == "ok"/"error" with assert span.error == 0/1. - Multi-span tests bind each span to a distinct `as` variable so the per-span assert_llmobs_span_data call has a definite Span reference; previously the tests indexed into the wire event list by ordinal. _expected_llmobs_eval_metric_event remains imported and is still used by the retained eval-metric-writer tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6d13692302
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Codeowners resolved as |
…bscripts Address PR review feedback: prefer the typed accessors in ddtrace.llmobs._utils over reaching into the LLMObsSpanData dict at test sites. Same coverage; reads now go through the same API the SDK itself uses. Substitutions: ["trace_id"] -> get_llmobs_trace_id(span) ["parent_id"] -> get_llmobs_parent_id(span) ["name"] -> get_llmobs_span_name(span) ["session_id"] -> get_llmobs_session_id(span) ["tags"]["ml_app"] -> get_llmobs_ml_app(span) ["meta"]["input"] -> get_llmobs_input(span) ["meta"]["output"] -> get_llmobs_output(span) ["meta"]["input"]["messages"] -> get_llmobs_input_messages(span) ["meta"]["input"]["value"] -> get_llmobs_input_value(span) ["meta"]["input"]["prompt"] -> get_llmobs_input_prompt(span) ["meta"]["model_name"] -> get_llmobs_model_name(span) ["meta"]["model_provider"]-> get_llmobs_model_provider(span) ["meta"]["metadata"]["_dd"]["cost_tags"] -> get_llmobs_cost_tags(span) Filter expressions [s for ... if _get_llmobs_data_metastruct(s)] are rewritten as ... if get_llmobs_span_kind(s)] — equivalent (a span with LLMObsSpanData on meta_struct always has a span_kind set) and uses the typed helper. _get_llmobs_data_metastruct still appears in two places where there is no helper: feeding the canonical payload to assert_llmobs_span_data (intentional — that's the helper's input contract), and a single meta.error read in test_error_span (no error helper exists). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The three TestMLApp tests in test_llmobs.py originally asserted that
"ml_app:..." appeared in the projected event's tag list. The previous
helper sweep switched them to get_llmobs_ml_app(span), which reads the
top-level meta_struct["_llmobs"]["ml_app"] field — a different attribute
than the tag the tests are named for. Both fields get set, but the test
names ("test_tag_defaults_to_env_var", "test_tag_overrides_env_var")
make clear the tag form is what's under test.
Switch back to get_llmobs_tags(span)["ml_app"], which reads the tag
dict on meta_struct (the canonical form of what becomes "ml_app:..." in
the projected event tag list).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Consolidate the meta_struct scrub in _on_span_finish to a single point
that fires whenever no event will reach the LLMObs backend — covering
all four "won't export" paths:
1. _prepare_llmobs_span_data returns False because meta_struct is empty
2. _prepare_llmobs_span_data returns False because span_kind is missing
(malformed span)
3. _prepare_llmobs_span_data returns False because the user-registered
span_processor returned None (explicit drop)
4. _llmobs_span_event raises and the exception is caught
Previously only the kept path scrubbed (after writer.enqueue), so any
of the above left the LLMObs payload in place on the underlying APM
span. That's a no-op today (the LLMObs writer never enqueued, and APM
doesn't read meta_struct["_llmobs"] yet), but it becomes a leak
post-convergence (MLOB-4925): once APM ships meta_struct["_llmobs"] to
the LLMObs backend, an "omitted/malformed/errored" span whose payload
was left intact would arrive at LLMObs anyway — exactly the data the
SDK said wouldn't be exported.
The new scrub is unconditional (overrides _DD_LLMOBS_TEST_KEEP_META_STRUCT),
because the contract for these paths is "no LLMObs data on this span"
regardless of test mode. The existing kept-path scrub at line 539-540
keeps its existing semantics — that's the only place the env var
suppresses the scrub.
Test changes:
- test_processor_omit_span: rewritten to assert the contract directly
(omit ⇒ _get_llmobs_data_metastruct(span) == {}, kept ⇒ payload intact).
- test_malformed_span_logs_error_instead_of_raising: same shape — drops
llmobs_events and asserts empty meta_struct on the malformed span.
Both rewrites remove the last llmobs_events references in
tests/llmobs/test_llmobs.py and the four migrated SDK test files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
71e53e0 to
f6d1ad1
Compare
…lpers Two cleanups in test_llmobs.py and test_llmobs_service.py: 1. Drop the test_spans fixture parameter from 12 tests that captured spans directly via `with llmobs.X() as span:` and never call test_spans.pop()/pop_traces(). The fixture was a holdover from the contrib pattern (which extracts spans by trace iteration); these tests have direct references and don't need it. 2. Replace assert_llmobs_span_data calls in 11 single-attribute tests with the corresponding get_llmobs_*(span) helper. Each test was originally a singular I/O assertion (input_value, input_messages, output_messages, output_value, metadata, metrics, span_name, span_kind, model_name, model_provider) — assert_llmobs_span_data is overkill for one-attribute checks. Drops the assert_llmobs_span_data import from this file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…p_level_field
The test's whole subject is "session_id is recoverable from a span set
via the session_id kwarg" — a single-attribute check. Replace
assert_llmobs_span_data(... tags={"session_id": ...}) with
get_llmobs_session_id(span) == session_id, which both reads more
directly and matches the test name (top-level field, not the tag form).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tests
Eight span-creation tests in test_llmobs_service.py
(test_llm_span, test_llm_span_no_model_sets_default, test_tool_span,
test_task_span, test_workflow_span, test_agent_span, test_embedding_span,
test_embedding_span_no_model_sets_default) followed a redundant pattern:
with llmobs.X(...) as span:
assert span.name == ... # APM span field
assert get_llmobs_span_kind(span) == ... # LLMObs field (inside)
...
assert_llmobs_span_data(..., span_kind=..., ...) # SAME LLMObs fields, AGAIN
The post-finish assert_llmobs_span_data fully duplicates the inside-block
get_llmobs_* checks (these fields are set at span start by
_activate_llmobs_span and aren't mutated by _normalize_llmobs_meta on
finish). Move the get_llmobs_* checks out of the with block (so they
run on the post-finish span) and drop the assert_llmobs_span_data
calls. APM span.name/resource/span_type checks stay inside the block —
those are independent.
For test_llm_span_no_model_sets_default and test_embedding_span_no_model_sets_default,
the inside-block originally only checked model_name; the after-block
added span_kind and model_provider as well. After the move, all three
fields are checked post-finish via single-attribute helpers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ruff check --fix is a no-op (passes). ruff format collapses 5 multi-line assert_llmobs_span_data calls in test_ml_app_override into single lines that fit the configured line length. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🎉 All green!❄️ No new flaky tests detected 🔗 Commit SHA: 7956e93 | Docs | Datadog PR Page | Give us feedback! |
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
The expected merge time in
|
…ata (#17876) ## Summary Extends the assert_llmobs_span_data migration (#17767, #17788, #17789, #17801, #17805) from contribs to the four "internal" SDK test files. Tests now read the canonical LLMObs payload from `meta_struct[\"_llmobs\"]` instead of the projected `LLMObsSpanEvent` captured by the writer mock — matching the SDK contract and surviving the agentless / agent-mode export-path split (cf. #17854) en route to APM/LLMObs convergence ([MLOB-4925](https://datadoghq.atlassian.net/browse/MLOB-4925)). Claude session: `0d529421-0fef-4f8e-bc6a-ef406a8c9eb4` Resume: `claude --resume 0d529421-0fef-4f8e-bc6a-ef406a8c9eb4` 🤖 Generated with [Claude Code](https://claude.com/claude-code) [MLOB-4925]: https://datadoghq.atlassian.net/browse/MLOB-4925?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Co-authored-by: yun.kim <yun.kim@datadoghq.com>
…ata (#17876) ## Summary Extends the assert_llmobs_span_data migration (#17767, #17788, #17789, #17801, #17805) from contribs to the four "internal" SDK test files. Tests now read the canonical LLMObs payload from `meta_struct[\"_llmobs\"]` instead of the projected `LLMObsSpanEvent` captured by the writer mock — matching the SDK contract and surviving the agentless / agent-mode export-path split (cf. #17854) en route to APM/LLMObs convergence ([MLOB-4925](https://datadoghq.atlassian.net/browse/MLOB-4925)). Claude session: `0d529421-0fef-4f8e-bc6a-ef406a8c9eb4` Resume: `claude --resume 0d529421-0fef-4f8e-bc6a-ef406a8c9eb4` 🤖 Generated with [Claude Code](https://claude.com/claude-code) [MLOB-4925]: https://datadoghq.atlassian.net/browse/MLOB-4925?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Co-authored-by: yun.kim <yun.kim@datadoghq.com>
Summary
Extends the assert_llmobs_span_data migration (#17767, #17788, #17789, #17801, #17805) from contribs to the four "internal" SDK test files. Tests now read the canonical LLMObs payload from
meta_struct[\"_llmobs\"]instead of the projectedLLMObsSpanEventcaptured by the writer mock — matching the SDK contract and surviving the agentless / agent-mode export-path split (cf. #17854) en route to APM/LLMObs convergence (MLOB-4925).Claude session:
0d529421-0fef-4f8e-bc6a-ef406a8c9eb4Resume:
claude --resume 0d529421-0fef-4f8e-bc6a-ef406a8c9eb4🤖 Generated with Claude Code