Skip to content

fix(instr-undici): fix user agent extraction and handle of multiple values on headers #2875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 4, 2025

Conversation

david-luna
Copy link
Contributor

Short description of the changes

Supersedes: #2781

benjaminjkraft and others added 4 commits April 3, 2025 12:43
The handling for User-Agent had a bunch of bugs, most notably with the handling of multiple-valued headers, which are rare but legal.

Code similar to the added test "another header with multiple values" caused us errors in production from the user-agent code, and is where I started. Reading the code, writing tests, and improving the types revealed several more bugs in the same code as well as the span-to-attributes handling; the added tests "multiple user-agents", "another header with value user-agent" also fail before this PR (the others are just to actually cover the ordinary cases.

Similarly, I also fixed an incorrect case in undici v5 where it would treat a `user-agent-bogus` header as a user agent, but I couldn't write a test case since the tests run with the newer version.
Copy link

codecov bot commented Jun 3, 2025

Codecov Report

Attention: Patch coverage is 94.44444% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.68%. Comparing base (5861dfa) to head (64eaaba).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
plugins/node/instrumentation-undici/src/undici.ts 94.44% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2875      +/-   ##
==========================================
- Coverage   89.69%   89.68%   -0.01%     
==========================================
  Files         186      186              
  Lines        9041     9053      +12     
  Branches     1855     1857       +2     
==========================================
+ Hits         8109     8119      +10     
- Misses        932      934       +2     
Files with missing lines Coverage Δ
plugins/node/instrumentation-undici/src/undici.ts 91.74% <94.44%> (-0.53%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@trentm trentm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a couple nits.

if (key.toLowerCase() === 'user-agent') {
// user-agent should only appear once per the spec, but the library doesn't
// prevent passing it multiple times, so we handle that to be safe.
const userAgent = Array.isArray(value) ? value[0] : value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) FWIW, from the curl docs the last specified User-Agent wins:

If --user-agent is provided several times, the last set value is used.

So we could consider taking the last value if it is an array. I am fine with the current code, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a similar conversation in #2781 (comment) but we did not get into a conclusion. IMO is not a bad idea to follow what other well known tools like curl are doing. I'll change it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 9a7f8d7

@@ -218,25 +254,14 @@ export class UndiciInstrumentation extends InstrumentationBase<UndiciInstrumenta
}

// Get user agent from headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Not for this PR, but https://opentelemetry.io/docs/specs/semconv/http/http-spans/ says that user_agent.original is Opt-In. I.e. collecting it should be off by default. We should do that in a separate (breaking) PR change.

if (headersToAttribs.has(name)) {
spanAttributes[`http.request.header.${name}`] = value.trim();
spanAttributes[`http.request.header.${name}`] = value
.toString()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: To be extremely spec picky, https://www.rfc-editor.org/rfc/rfc9110.html#section-5.3 says (emphasis mine):

A recipient MAY combine multiple field lines within a field section that have the same field name into one field line, without changing the semantics of the message, by appending each subsequent field line value to the initial field line value in order, separated by a comma (",") and optional whitespace (OWS, defined in Section 5.6.3). For consistency, use comma SP.

So I would say it is slightly preferred to have the joining of an array of values use ', ' rather than the ',' that is implicitly being used via Array.prototype.toString().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with the preferred separator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david-luna
Copy link
Contributor Author

Last test run revealed the generator function wasn't the best suited for the job. In some scenarios (headers as string) the generator returned multiple values with the same key property since it was yielding the result for each line.

Instrumentation needs to process all lines to make sure it collects all header values for a given name. So I've changed the implementation to return a map instead in commit 01c1c86

@david-luna david-luna requested a review from trentm June 3, 2025 22:31
@david-luna
Copy link
Contributor Author

Sorry @trentm but I prefer to have a second review since I've changed a bit the way headers are parsed. I'm still able to merge although I pushed a couple of commits after your approval. Is that supposed to work like this?


if (userAgent) {
if (userAgentValues) {
const userAgent = Array.isArray(userAgentValues)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Choosing the last user-agent value is an "interesting thing" to have a comment on, I think. Could point to the curl example as justification.

if (headersToAttribs.has(name)) {
spanAttributes[`http.request.header.${name}`] = value.trim();
const attrValue = Array.isArray(value) ? value.join(', ') : value;
spanAttributes[`http.request.header.${name}`] = attrValue.trim();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think the .trim() is necessary now because parseRequestHeaders does that.

@david-luna david-luna enabled auto-merge (squash) June 4, 2025 17:54
@david-luna david-luna disabled auto-merge June 4, 2025 18:11
@david-luna david-luna merged commit 8820f65 into open-telemetry:main Jun 4, 2025
23 checks passed
@dyladan dyladan mentioned this pull request Jun 4, 2025
@david-luna david-luna deleted the fix-undici-user-agent branch June 4, 2025 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants