Skip to content

test(llmobs): migrate tests/llmobs SDK tests to assert on LLMObsSpanData#17876

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 14 commits into
mainfrom
yunkim/llmobs-test-meta-struct-internal
May 6, 2026
Merged

test(llmobs): migrate tests/llmobs SDK tests to assert on LLMObsSpanData#17876
gh-worker-dd-mergequeue-cf854d[bot] merged 14 commits into
mainfrom
yunkim/llmobs-test-meta-struct-internal

Conversation

@Yun-Kim
Copy link
Copy Markdown
Contributor

@Yun-Kim Yun-Kim commented May 4, 2026

Summary

Extends the assert_llmobs_span_data migration (#17767, #17788, #17789, #17801, #17805) from contribs to the four "internal" SDK test files. Tests now read the canonical LLMObs payload from meta_struct[\"_llmobs\"] instead of the projected LLMObsSpanEvent captured by the writer mock — matching the SDK contract and surviving the agentless / agent-mode export-path split (cf. #17854) en route to APM/LLMObs convergence (MLOB-4925).

Claude session: 0d529421-0fef-4f8e-bc6a-ef406a8c9eb4
Resume: claude --resume 0d529421-0fef-4f8e-bc6a-ef406a8c9eb4

🤖 Generated with Claude Code

Yun-Kim and others added 5 commits May 4, 2026 16:49
…fixture

Preparation for migrating the four internal SDK test files
(test_llmobs.py, test_llmobs_decorators.py, test_llmobs_service.py,
test_propagation.py) onto assert_llmobs_span_data — same pattern as
the contrib migrations in #17767, #17788, #17789, #17801, #17805.

Setting the env var inside the fixture is benign for tests that still
assert on llmobs_events: it only skips the post-enqueue meta_struct
scrub in _on_span_finish, so the writer continues to receive projected
LLMObsSpanEvents unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch the 5 tests that read llmobs_events to read spans directly via
test_spans + _get_llmobs_data_metastruct. The remaining tests already
assert on span.context._meta or use subprocess-driven distributed
header tests and don't go through the projection step.

The wire-only _dd.apm_trace_id field is replaced by
format_trace_id(span.trace_id), which is its definition (see
ddtrace/llmobs/_llmobs.py:633), and span_id is read off the Span
directly. parent_id and the LLMObs trace_id come straight from
meta_struct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…anData

All 51 test items in this file go through the same _expected_llmobs_*
projection helpers — those helpers reconstruct what the wire event
should look like rather than validate the projection step itself, so
asserting on the canonical LLMObsSpanData at meta_struct["_llmobs"] is
equivalent for every assertion the tests make.

Mechanical conversions:
- The 2nd positional arg of _expected_llmobs_* (e.g. "llm", "agent")
  becomes span_kind=...
- token_metrics= renamed to metrics=
- is_decorator=True folded into tags={"decorator": "1"} (subset match)
- session_id="..." folded into tags={"session_id": "..."}
- error=, error_message=, error_stack= triple folded into a single
  error={"type": ..., "message": ..., "stack": ...} dict, with the
  fields read off span.get_tag("error.type"|"error.message"|"error.stack")
  for the cases that didn't pin them explicitly

Two unparseable-output tests stayed on a direct meta_struct assertion
(meta.output == {}) since assert_llmobs_span_data skips checks when
output_messages= is None, and the test specifically verifies absence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
60 tests migrate to assert_llmobs_span_data on the canonical payload at
meta_struct["_llmobs"], following the same pattern as test_propagation.py
and test_llmobs_decorators.py.

Two tests intentionally stay on llmobs_events:
- test_malformed_span_logs_error_instead_of_raising — the malformed span
  never reaches _prepare_llmobs_span_data, so wire-event count is the
  cleanest signal that no event was emitted.
- TestLLMIOProcessing.test_processor_omit_span — span_processor
  suppression is a wire-level side effect (the writer is skipped); the
  meta_struct payload is still populated. The omit semantics are about
  what reaches the wire, not what's on the canonical payload.

Tests that read APM trace IDs (formerly span_event["_dd"]["apm_trace_id"])
now read format_trace_id(span.trace_id) directly off the underlying
APM span, matching the test_propagation.py migration. apm_trace_id is
projection-only — it's computed at _llmobs_span_event time, not stored
on meta_struct.

Tests that previously took only `tracer` as a fixture and pulled `llmobs`
in transitively via `llmobs_events` now take `llmobs` directly. Without
the LLMObs fixture enabled, _annotate_llmobs_span_data wouldn't reach
the meta_struct path.

Cleaned up unused `llmobs_events` fixture parameters from a handful of
tests that asserted only on side-effects (LLMOBS_SUBMITTED_TAG_KEY tag,
local-root trace_id propagation, APM trace dropping).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Migrate the ~20 tests that asserted on llmobs_events / _expected_llmobs_*
to assert_llmobs_span_data on the canonical meta_struct["_llmobs"]
payload. Test classes covered: span creation (test_llm_span,
test_task_span, etc.), session_id, error handling, ml_app override,
annotation-context trace structure, LLMObs↔APM parenting, and
TestBuildSpanEventFromMetaStructE2E.

Tests intentionally kept on llmobs_events / writer mocks:
- All 11 eval-metric-writer tests (test_submit_evaluation_*) — they
  assert on mock_llmobs_eval_metric_writer.enqueue.assert_called_with()
  and use _expected_llmobs_eval_metric_event(); these test the eval
  metric submission API, not the span payload.
- test_listener_hooks_enqueue_correct_writer (subprocess writer-routing)
- test_llmobs_with_evaluation_runner_does_not_enqueue_non_llm_spans
  (mocks evaluator_runner.enqueue)
- 9 fork tests (subprocess-based, separate assertion style)
- test_llmobs_with_evaluator_runner

Non-mechanical adaptations (consistent with the test_llmobs.py and
test_propagation.py migrations):
- span_id is added at projection time (not stored on meta_struct), so
  span-id assertions read str(span.span_id) directly off the Span.
- apm_trace_id is projection-only; reads use format_trace_id(span.trace_id).
- status is computed from span.error at projection time; replaced
  event["status"] == "ok"/"error" with assert span.error == 0/1.
- Multi-span tests bind each span to a distinct `as` variable so the
  per-span assert_llmobs_span_data call has a definite Span reference;
  previously the tests indexed into the wire event list by ordinal.

_expected_llmobs_eval_metric_event remains imported and is still used
by the retained eval-metric-writer tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Yun-Kim Yun-Kim requested a review from a team as a code owner May 4, 2026 21:44
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6d13692302

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread tests/llmobs/test_llmobs_service.py Outdated
@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da Bot commented May 4, 2026

Codeowners resolved as

ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
tests/llmobs/conftest.py                                                @DataDog/ml-observability
tests/llmobs/test_llmobs.py                                             @DataDog/ml-observability
tests/llmobs/test_llmobs_decorators.py                                  @DataDog/ml-observability
tests/llmobs/test_llmobs_service.py                                     @DataDog/ml-observability
tests/llmobs/test_propagation.py                                        @DataDog/ml-observability

Yun-Kim and others added 3 commits May 6, 2026 10:46
…bscripts

Address PR review feedback: prefer the typed accessors in
ddtrace.llmobs._utils over reaching into the LLMObsSpanData dict at
test sites. Same coverage; reads now go through the same API the SDK
itself uses.

Substitutions:
  ["trace_id"]              -> get_llmobs_trace_id(span)
  ["parent_id"]             -> get_llmobs_parent_id(span)
  ["name"]                  -> get_llmobs_span_name(span)
  ["session_id"]            -> get_llmobs_session_id(span)
  ["tags"]["ml_app"]        -> get_llmobs_ml_app(span)
  ["meta"]["input"]         -> get_llmobs_input(span)
  ["meta"]["output"]        -> get_llmobs_output(span)
  ["meta"]["input"]["messages"] -> get_llmobs_input_messages(span)
  ["meta"]["input"]["value"]    -> get_llmobs_input_value(span)
  ["meta"]["input"]["prompt"]   -> get_llmobs_input_prompt(span)
  ["meta"]["model_name"]    -> get_llmobs_model_name(span)
  ["meta"]["model_provider"]-> get_llmobs_model_provider(span)
  ["meta"]["metadata"]["_dd"]["cost_tags"] -> get_llmobs_cost_tags(span)

Filter expressions [s for ... if _get_llmobs_data_metastruct(s)] are
rewritten as ... if get_llmobs_span_kind(s)] — equivalent (a span with
LLMObsSpanData on meta_struct always has a span_kind set) and uses the
typed helper.

_get_llmobs_data_metastruct still appears in two places where there is
no helper: feeding the canonical payload to assert_llmobs_span_data
(intentional — that's the helper's input contract), and a single
meta.error read in test_error_span (no error helper exists).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The three TestMLApp tests in test_llmobs.py originally asserted that
"ml_app:..." appeared in the projected event's tag list. The previous
helper sweep switched them to get_llmobs_ml_app(span), which reads the
top-level meta_struct["_llmobs"]["ml_app"] field — a different attribute
than the tag the tests are named for. Both fields get set, but the test
names ("test_tag_defaults_to_env_var", "test_tag_overrides_env_var")
make clear the tag form is what's under test.

Switch back to get_llmobs_tags(span)["ml_app"], which reads the tag
dict on meta_struct (the canonical form of what becomes "ml_app:..." in
the projected event tag list).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Consolidate the meta_struct scrub in _on_span_finish to a single point
that fires whenever no event will reach the LLMObs backend — covering
all four "won't export" paths:

  1. _prepare_llmobs_span_data returns False because meta_struct is empty
  2. _prepare_llmobs_span_data returns False because span_kind is missing
     (malformed span)
  3. _prepare_llmobs_span_data returns False because the user-registered
     span_processor returned None (explicit drop)
  4. _llmobs_span_event raises and the exception is caught

Previously only the kept path scrubbed (after writer.enqueue), so any
of the above left the LLMObs payload in place on the underlying APM
span. That's a no-op today (the LLMObs writer never enqueued, and APM
doesn't read meta_struct["_llmobs"] yet), but it becomes a leak
post-convergence (MLOB-4925): once APM ships meta_struct["_llmobs"] to
the LLMObs backend, an "omitted/malformed/errored" span whose payload
was left intact would arrive at LLMObs anyway — exactly the data the
SDK said wouldn't be exported.

The new scrub is unconditional (overrides _DD_LLMOBS_TEST_KEEP_META_STRUCT),
because the contract for these paths is "no LLMObs data on this span"
regardless of test mode. The existing kept-path scrub at line 539-540
keeps its existing semantics — that's the only place the env var
suppresses the scrub.

Test changes:
- test_processor_omit_span: rewritten to assert the contract directly
  (omit ⇒ _get_llmobs_data_metastruct(span) == {}, kept ⇒ payload intact).
- test_malformed_span_logs_error_instead_of_raising: same shape — drops
  llmobs_events and asserts empty meta_struct on the malformed span.

Both rewrites remove the last llmobs_events references in
tests/llmobs/test_llmobs.py and the four migrated SDK test files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Yun-Kim Yun-Kim force-pushed the yunkim/llmobs-test-meta-struct-internal branch from 71e53e0 to f6d1ad1 Compare May 6, 2026 15:42
Yun-Kim and others added 4 commits May 6, 2026 12:32
…lpers

Two cleanups in test_llmobs.py and test_llmobs_service.py:

1. Drop the test_spans fixture parameter from 12 tests that captured
   spans directly via `with llmobs.X() as span:` and never call
   test_spans.pop()/pop_traces(). The fixture was a holdover from the
   contrib pattern (which extracts spans by trace iteration); these
   tests have direct references and don't need it.

2. Replace assert_llmobs_span_data calls in 11 single-attribute tests
   with the corresponding get_llmobs_*(span) helper. Each test was
   originally a singular I/O assertion (input_value, input_messages,
   output_messages, output_value, metadata, metrics, span_name,
   span_kind, model_name, model_provider) — assert_llmobs_span_data is
   overkill for one-attribute checks. Drops the assert_llmobs_span_data
   import from this file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…p_level_field

The test's whole subject is "session_id is recoverable from a span set
via the session_id kwarg" — a single-attribute check. Replace
assert_llmobs_span_data(... tags={"session_id": ...}) with
get_llmobs_session_id(span) == session_id, which both reads more
directly and matches the test name (top-level field, not the tag form).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tests

Eight span-creation tests in test_llmobs_service.py
(test_llm_span, test_llm_span_no_model_sets_default, test_tool_span,
test_task_span, test_workflow_span, test_agent_span, test_embedding_span,
test_embedding_span_no_model_sets_default) followed a redundant pattern:

  with llmobs.X(...) as span:
      assert span.name == ...                       # APM span field
      assert get_llmobs_span_kind(span) == ...      # LLMObs field (inside)
      ...
  assert_llmobs_span_data(..., span_kind=..., ...)  # SAME LLMObs fields, AGAIN

The post-finish assert_llmobs_span_data fully duplicates the inside-block
get_llmobs_* checks (these fields are set at span start by
_activate_llmobs_span and aren't mutated by _normalize_llmobs_meta on
finish). Move the get_llmobs_* checks out of the with block (so they
run on the post-finish span) and drop the assert_llmobs_span_data
calls. APM span.name/resource/span_type checks stay inside the block —
those are independent.

For test_llm_span_no_model_sets_default and test_embedding_span_no_model_sets_default,
the inside-block originally only checked model_name; the after-block
added span_kind and model_provider as well. After the move, all three
fields are checked post-finish via single-attribute helpers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Yun-Kim Yun-Kim added the changelog/no-changelog A changelog entry is not required for this PR. label May 6, 2026
ruff check --fix is a no-op (passes). ruff format collapses 5
multi-line assert_llmobs_span_data calls in test_ml_app_override into
single lines that fit the configured line length.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@datadog-prod-us1-3
Copy link
Copy Markdown

datadog-prod-us1-3 Bot commented May 6, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 7956e93 | Docs | Datadog PR Page | Give us feedback!

@Yun-Kim
Copy link
Copy Markdown
Contributor Author

Yun-Kim commented May 6, 2026

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented May 6, 2026

View all feedbacks in Devflow UI.

2026-05-06 19:19:59 UTC ℹ️ Start processing command /merge


2026-05-06 19:20:09 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2026-05-06 19:24:14 UTC ℹ️ MergeQueue: merge request added to the queue

The expected merge time in main is approximately 53m (p90).


2026-05-06 20:01:31 UTC ℹ️ MergeQueue: This merge request was merged

@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit b8b617c into main May 6, 2026
559 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the yunkim/llmobs-test-meta-struct-internal branch May 6, 2026 20:01
dubloom pushed a commit that referenced this pull request May 11, 2026
…ata (#17876)

## Summary

Extends the assert_llmobs_span_data migration (#17767, #17788, #17789, #17801, #17805) from contribs to the four "internal" SDK test files. Tests now read the canonical LLMObs payload from `meta_struct[\"_llmobs\"]` instead of the projected `LLMObsSpanEvent` captured by the writer mock — matching the SDK contract and surviving the agentless / agent-mode export-path split (cf. #17854) en route to APM/LLMObs convergence ([MLOB-4925](https://datadoghq.atlassian.net/browse/MLOB-4925)).


Claude session: `0d529421-0fef-4f8e-bc6a-ef406a8c9eb4`
Resume: `claude --resume 0d529421-0fef-4f8e-bc6a-ef406a8c9eb4`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

[MLOB-4925]: https://datadoghq.atlassian.net/browse/MLOB-4925?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Co-authored-by: yun.kim <yun.kim@datadoghq.com>
P403n1x87 pushed a commit that referenced this pull request May 14, 2026
…ata (#17876)

## Summary

Extends the assert_llmobs_span_data migration (#17767, #17788, #17789, #17801, #17805) from contribs to the four "internal" SDK test files. Tests now read the canonical LLMObs payload from `meta_struct[\"_llmobs\"]` instead of the projected `LLMObsSpanEvent` captured by the writer mock — matching the SDK contract and surviving the agentless / agent-mode export-path split (cf. #17854) en route to APM/LLMObs convergence ([MLOB-4925](https://datadoghq.atlassian.net/browse/MLOB-4925)).


Claude session: `0d529421-0fef-4f8e-bc6a-ef406a8c9eb4`
Resume: `claude --resume 0d529421-0fef-4f8e-bc6a-ef406a8c9eb4`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

[MLOB-4925]: https://datadoghq.atlassian.net/browse/MLOB-4925?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Co-authored-by: yun.kim <yun.kim@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/no-changelog A changelog entry is not required for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants