-
Notifications
You must be signed in to change notification settings - Fork 580
fix(instr-undici): fix user agent extraction and handle of multiple values on headers #2875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(instr-undici): fix user agent extraction and handle of multiple values on headers #2875
Conversation
The handling for User-Agent had a bunch of bugs, most notably with the handling of multiple-valued headers, which are rare but legal. Code similar to the added test "another header with multiple values" caused us errors in production from the user-agent code, and is where I started. Reading the code, writing tests, and improving the types revealed several more bugs in the same code as well as the span-to-attributes handling; the added tests "multiple user-agents", "another header with value user-agent" also fail before this PR (the others are just to actually cover the ordinary cases. Similarly, I also fixed an incorrect case in undici v5 where it would treat a `user-agent-bogus` header as a user agent, but I couldn't write a test case since the tests run with the newer version.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2875 +/- ##
==========================================
- Coverage 89.69% 89.68% -0.01%
==========================================
Files 186 186
Lines 9041 9053 +12
Branches 1855 1857 +2
==========================================
+ Hits 8109 8119 +10
- Misses 932 934 +2
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a couple nits.
if (key.toLowerCase() === 'user-agent') { | ||
// user-agent should only appear once per the spec, but the library doesn't | ||
// prevent passing it multiple times, so we handle that to be safe. | ||
const userAgent = Array.isArray(value) ? value[0] : value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(non-blocking) FWIW, from the curl
docs the last specified User-Agent wins:
If --user-agent is provided several times, the last set value is used.
So we could consider taking the last value if it is an array. I am fine with the current code, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a similar conversation in #2781 (comment) but we did not get into a conclusion. IMO is not a bad idea to follow what other well known tools like curl
are doing. I'll change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 9a7f8d7
@@ -218,25 +254,14 @@ export class UndiciInstrumentation extends InstrumentationBase<UndiciInstrumenta | |||
} | |||
|
|||
// Get user agent from headers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Not for this PR, but https://opentelemetry.io/docs/specs/semconv/http/http-spans/ says that user_agent.original
is Opt-In. I.e. collecting it should be off by default. We should do that in a separate (breaking) PR change.
if (headersToAttribs.has(name)) { | ||
spanAttributes[`http.request.header.${name}`] = value.trim(); | ||
spanAttributes[`http.request.header.${name}`] = value | ||
.toString() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: To be extremely spec picky, https://www.rfc-editor.org/rfc/rfc9110.html#section-5.3 says (emphasis mine):
A recipient MAY combine multiple field lines within a field section that have the same field name into one field line, without changing the semantics of the message, by appending each subsequent field line value to the initial field line value in order, separated by a comma (",") and optional whitespace (OWS, defined in Section 5.6.3). For consistency, use comma SP.
So I would say it is slightly preferred to have the joining of an array of values use ', '
rather than the ','
that is implicitly being used via Array.prototype.toString()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with the preferred separator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last test run revealed the generator function wasn't the best suited for the job. In some scenarios (headers as string) the generator returned multiple values with the same Instrumentation needs to process all lines to make sure it collects all header values for a given name. So I've changed the implementation to return a map instead in commit 01c1c86 |
Sorry @trentm but I prefer to have a second review since I've changed a bit the way headers are parsed. I'm still able to merge although I pushed a couple of commits after your approval. Is that supposed to work like this? |
|
||
if (userAgent) { | ||
if (userAgentValues) { | ||
const userAgent = Array.isArray(userAgentValues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Choosing the last user-agent value is an "interesting thing" to have a comment on, I think. Could point to the curl
example as justification.
if (headersToAttribs.has(name)) { | ||
spanAttributes[`http.request.header.${name}`] = value.trim(); | ||
const attrValue = Array.isArray(value) ? value.join(', ') : value; | ||
spanAttributes[`http.request.header.${name}`] = attrValue.trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think the .trim()
is necessary now because parseRequestHeaders
does that.
Short description of the changes
Supersedes: #2781