Skip to content

OTEP: correlating OBI traces to profiles#4855

Open
mmat11 wants to merge 3 commits intoopen-telemetry:mainfrom
mmat11:matt/profile-trace-obi-otep
Open

OTEP: correlating OBI traces to profiles#4855
mmat11 wants to merge 3 commits intoopen-telemetry:mainfrom
mmat11:matt/profile-trace-obi-otep

Conversation

@mmat11
Copy link

@mmat11 mmat11 commented Jan 27, 2026

Changes

This OTEP introduces a standard communication channel and a specification for correlating profiles to opentelemetry-ebpf-instrumentation (OBI) traces.

There is a PoC implementation at open-telemetry/opentelemetry-ebpf-instrumentation#1184 and coralogix/opentelemetry-ebpf-profiler#1

  • Related issues #
  • Related OTEP(s) #212
  • Links to the prototypes (when adding or changing features)
  • CHANGELOG.md file updated for non-trivial changes
    • For trivial changes, include [chore] in the PR title to skip the changelog check
  • Spec compliance matrix updated if necessary

Opening as Draft for submission to the relevant SIGs first

Signed-off-by: Mattia Meleleo <mattia.meleleo@coralogix.com>
@mmat11 mmat11 force-pushed the matt/profile-trace-obi-otep branch from d88793d to 5c20f4e Compare January 27, 2026 15:50
Signed-off-by: Mattia Meleleo <mattia.meleleo@coralogix.com>
@mmat11 mmat11 marked this pull request as ready for review January 29, 2026 23:45
@mmat11 mmat11 requested review from a team as code owners January 29, 2026 23:45
Signed-off-by: Mattia Meleleo <mattia.meleleo@coralogix.com>
Copy link

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This is quite nimble and makes a lot of sense!

@@ -0,0 +1,55 @@
# Correlating Profiles to OBI Traces
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to provide additional context (e.g. links if context exists somewhere else) for correlation in general:

  • why OBI traces are targeted here and not the SDK traces?
  • how/if OBI traces are correlated to SDK traces?
  • how SDK traces are correlated to profiles

Also, what happens when application is getting instrumented and which mechanisms would win?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why OBI traces are targeted here and not the SDK traces?

there are other proposals which targets SDK traces specifically, see: #4719 and https://docs.google.com/document/d/1eatbHpEXXhWZEPrXZpfR58-5RIx-81mUgF69Zpn3Rz4/edit?tab=t.0#heading=h.fvztn3xtjxxm

how/if OBI traces are correlated to SDK traces?

I don't think they are, they have different ways of doing the same thing (context propagation) and if a trace exists, it will be ideally inherited both by OBI and SDKs

how SDK traces are correlated to profiles

this OTEP is OBI specific, for SDKs I believe the work/proposal is located here: https://docs.google.com/document/d/1eatbHpEXXhWZEPrXZpfR58-5RIx-81mUgF69Zpn3Rz4/edit?tab=t.0#heading=h.fvztn3xtjxxm

Also, what happens when application is getting instrumented and which mechanisms would win?

I think it should be something like:

trace_ctx = try_sdk()
if !trace_ctx:
    trace_ctx = try_obi()

or viceversa; if both sources have a trace context and it differs, perhaps it's a bug

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 I'm one of the authors of the gdoc mentioned -- we plan to turn it into an OTEP soon. The TL;DR of why we need OBI specifics is that due to the way OBI works it's awkward to use the same mechanism we're planning for regular SDKs and vice-versa.

See also #4719 (comment) for more on this discussion.

Copy link
Member

@christos68k christos68k Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmolkova We discussed the priority/fidelity (who wins?) issue at today's Profiling SIG (also previously raised by me in the proposed implementation here)

Given that this OTEP focuses on a specific technical solution (fast data exchange between two OTel components), should we attempt to answer the priority/fidelity question here or somewhere more general? In today's SIG, we discussed that the same clash can be present in the rest of OTel, e.g. if one has multiple layers of instrumentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having more context in the OTEP is helpful and increases chances of people reviewing it.
It's important to understand the implications of this otep on general correlation between obi, sdk, and profiling.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to understand the implications of this otep on general correlation between obi, sdk, and profiling.

In my opinion it belongs in a more general document as it would dictate the correlation/priority/etc. of generic readers (eg. the profiler) with writers (eg. OBI, SDKs). Let me know if should I add (or clarify) something OBI specific

Comment on lines +52 to +53
u8 trace_id[16];
u8 span_id[8];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we don't type these as

Suggested change
u8 trace_id[16];
u8 span_id[8];
u64 trace_id[2];
u64 span_id;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. These are opaque 16/8-byte identifiers, not integers. The canonical form (including traceparent) is defined over raw bytes, and we manipulate them as bytes in eBPF and in Go ([16]byte / [8]byte). Using u8[16] / u8[8] keeps the in-memory layout identical to the wire format and the userspace representation.

If we type them as u64 / u64[2], we introduce endianness concerns. We'd have to define hi/lo ordering and a canonical byte order, and consistently convert at every kernel <-> userspace boundary and during header encoding. That's unnecessary complexity and an easy source of subtle cross-arch bugs, especially since the protocol is byte-ordered, not integer-ordered.

We don't perform arithmetic on these values, so there's no real benefit to representing them as integers. As a side note, keeping them as byte arrays also avoids type-punning/strict-aliasing issues from casting between u64* and byte buffers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants