Skip to content

fix(instr-undici): fix user agent extraction and handle of multiple values on headers #2875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 4, 2025
Merged
5 changes: 3 additions & 2 deletions plugins/node/instrumentation-undici/src/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,10 @@ export interface UndiciRequest {
path: string;
/**
* Serialized string of headers in the form `name: value\r\n` for v5
* Array of strings v6
* Array of strings `[key1, value1, key2, value2]`, where values are
* `string | string[]` for v6
*/
headers: string | string[];
headers: string | (string | string[])[];
/**
* Helper method to add headers (from v6)
*/
Expand Down
85 changes: 49 additions & 36 deletions plugins/node/instrumentation-undici/src/undici.ts
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,42 @@
});
}

/**
* Yield an object { key, value } for each header in the request. Skips
* likely-invalid headers. Multi-valued headers are passed through.
*/
private *parseRequestHeaders(
request: UndiciRequest
): Generator<{ key: string; value: string | string[] }> {
if (Array.isArray(request.headers)) {
// headers are an array [k1, v2, k2, v2] (undici v6+)
for (let i = 0; i < request.headers.length; i += 2) {
const key = request.headers[i];
if (typeof key !== 'string') {
// Shouldn't happen, but the types don't know that, and let's be safe
continue;

Check warning on line 181 in plugins/node/instrumentation-undici/src/undici.ts

View check run for this annotation

Codecov / codecov/patch

plugins/node/instrumentation-undici/src/undici.ts#L181

Added line #L181 was not covered by tests
}
yield { key, value: request.headers[i + 1] };
}
} else if (typeof request.headers === 'string') {
// headers are a raw string (undici v5)
const headers = request.headers.split('\r\n');
for (const line of headers) {
if (!line) {
continue;
}
const colonIndex = line.indexOf(':');
if (colonIndex === -1) {
// Invalid header? Probably this can't happen, but again let's be safe.
continue;

Check warning on line 195 in plugins/node/instrumentation-undici/src/undici.ts

View check run for this annotation

Codecov / codecov/patch

plugins/node/instrumentation-undici/src/undici.ts#L195

Added line #L195 was not covered by tests
}
const key = line.substring(0, colonIndex);
const value = line.substring(colonIndex + 1).trim();
yield { key, value };
}
}
}

// This is the 1st message we receive for each request (fired after request creation). Here we will
// create the span and populate some atttributes, then link the span to the request for further
// span processing
Expand Down Expand Up @@ -218,25 +254,14 @@
}

// Get user agent from headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Not for this PR, but https://opentelemetry.io/docs/specs/semconv/http/http-spans/ says that user_agent.original is Opt-In. I.e. collecting it should be off by default. We should do that in a separate (breaking) PR change.

let userAgent;
if (Array.isArray(request.headers)) {
const idx = request.headers.findIndex(
h => h.toLowerCase() === 'user-agent'
);
if (idx >= 0) {
userAgent = request.headers[idx + 1];
for (const { key, value } of this.parseRequestHeaders(request)) {
if (key.toLowerCase() === 'user-agent') {
// user-agent should only appear once per the spec, but the library doesn't
// prevent passing it multiple times, so we handle that to be safe.
const userAgent = Array.isArray(value) ? value[0] : value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) FWIW, from the curl docs the last specified User-Agent wins:

If --user-agent is provided several times, the last set value is used.

So we could consider taking the last value if it is an array. I am fine with the current code, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a similar conversation in #2781 (comment) but we did not get into a conclusion. IMO is not a bad idea to follow what other well known tools like curl are doing. I'll change it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 9a7f8d7

attributes[SemanticAttributes.USER_AGENT_ORIGINAL] = userAgent;
break;
}
} else if (typeof request.headers === 'string') {
const headers = request.headers.split('\r\n');
const uaHeader = headers.find(h =>
h.toLowerCase().startsWith('user-agent')
);
userAgent =
uaHeader && uaHeader.substring(uaHeader.indexOf(':') + 1).trim();
}

if (userAgent) {
attributes[SemanticAttributes.USER_AGENT_ORIGINAL] = userAgent;
}

// Get attributes from the hook if present
Expand Down Expand Up @@ -330,26 +355,14 @@
config.headersToSpanAttributes.requestHeaders.map(n => n.toLowerCase())
);

// headers could be in form
// ['name: value', ...] for v5
// ['name', 'value', ...] for v6
const rawHeaders = Array.isArray(request.headers)
? request.headers
: request.headers.split('\r\n');
rawHeaders.forEach((h, idx) => {
const sepIndex = h.indexOf(':');
const hasSeparator = sepIndex !== -1;
const name = (
hasSeparator ? h.substring(0, sepIndex) : h
).toLowerCase();
const value = hasSeparator
? h.substring(sepIndex + 1)
: rawHeaders[idx + 1];

for (const { key, value } of this.parseRequestHeaders(request)) {
const name = key.toLowerCase();
if (headersToAttribs.has(name)) {
spanAttributes[`http.request.header.${name}`] = value.trim();
spanAttributes[`http.request.header.${name}`] = value
.toString()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: To be extremely spec picky, https://www.rfc-editor.org/rfc/rfc9110.html#section-5.3 says (emphasis mine):

A recipient MAY combine multiple field lines within a field section that have the same field name into one field line, without changing the semantics of the message, by appending each subsequent field line value to the initial field line value in order, separated by a comma (",") and optional whitespace (OWS, defined in Section 5.6.3). For consistency, use comma SP.

So I would say it is slightly preferred to have the joining of an array of values use ', ' rather than the ',' that is implicitly being used via Array.prototype.toString().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with the preferred separator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.trim();
}
});
}
}

span.setAttributes(spanAttributes);
Expand Down
108 changes: 76 additions & 32 deletions plugins/node/instrumentation-undici/test/undici.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -797,43 +797,87 @@ describe('UndiciInstrumentation `undici` tests', function () {
});
});

it('should not report an user-agent if it was not defined', async function () {
let spans = memoryExporter.getFinishedSpans();
assert.strictEqual(spans.length, 0);
const userAgentRequests: Array<{
name: string;
headers: Record<string, string | string[] | undefined>;
expectedUserAgent: string | undefined;
}> = [
{
name: 'no user-agent',
headers: { 'foo-client': 'bar' },
expectedUserAgent: undefined,
},
{
name: 'a user-agent',
headers: { 'foo-client': 'bar', 'user-agent': 'custom' },
expectedUserAgent: 'custom',
},
{
name: 'explicitly-undefined user-agent',
headers: { 'foo-client': 'bar', 'user-agent': undefined },
expectedUserAgent: undefined,
},
// contra the spec, but we shouldn't crash
{
name: 'multiple user-agents',
headers: {
'foo-client': 'bar',
'user-agent': ['agent', 'other-agent'],
},
expectedUserAgent: 'agent',
},
{
name: 'another header with value user-agent',
headers: { 'foo-client': 'user-agent', 'user-agent': 'custom' },
expectedUserAgent: 'custom',
},
{
name: 'another header with multiple values',
headers: { 'foo-client': ['one', 'two'], 'user-agent': 'custom' },
expectedUserAgent: 'custom',
},
{
name: 'another header with explicitly-undefined value',
headers: { 'foo-client': undefined, 'user-agent': 'custom' },
expectedUserAgent: 'custom',
},
];

// Do some requests
const headers = {
'foo-client': 'bar',
};
for (const testCase of userAgentRequests) {
it(`should report the correct user-agent when the request has ${testCase.name}`, async function () {
let spans = memoryExporter.getFinishedSpans();
assert.strictEqual(spans.length, 0);

const queryRequestUrl = `${protocol}://${hostname}:${mockServer.port}/?query=test`;
const queryResponse = await undici.request(queryRequestUrl, { headers });
await consumeResponseBody(queryResponse.body);
const queryRequestUrl = `${protocol}://${hostname}:${mockServer.port}/?query=test`;
const queryResponse = await undici.request(queryRequestUrl, {
headers: testCase.headers,
});
await consumeResponseBody(queryResponse.body);

assert.ok(
queryResponse.headers['propagation-error'] == null,
'propagation is set for instrumented requests'
);
assert.ok(
queryResponse.headers['propagation-error'] == null,
'propagation is set for instrumented requests'
);

spans = memoryExporter.getFinishedSpans();
const span = spans[0];
assert.ok(span, 'a span is present');
assert.strictEqual(spans.length, 1);
assertSpan(span, {
hostname: 'localhost',
httpStatusCode: queryResponse.statusCode,
httpMethod: 'GET',
path: '/',
query: '?query=test',
reqHeaders: headers,
resHeaders: queryResponse.headers,
spans = memoryExporter.getFinishedSpans();
const span = spans[0];
assert.ok(span, 'a span is present');
assert.strictEqual(spans.length, 1);
assertSpan(span, {
hostname: 'localhost',
httpStatusCode: queryResponse.statusCode,
httpMethod: 'GET',
path: '/',
query: '?query=test',
reqHeaders: testCase.headers,
resHeaders: queryResponse.headers,
});
assert.strictEqual(
span.attributes['user_agent.original'],
testCase.expectedUserAgent
);
});
assert.strictEqual(
span.attributes['user_agent.original'],
undefined,
'user-agent is undefined'
);
});
}

it('should create valid span if request.path is a full URL', async function () {
let spans = memoryExporter.getFinishedSpans();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ export const assertSpan = (
if (userAgent) {
assert.strictEqual(
span.attributes[SemanticAttributes.USER_AGENT_ORIGINAL],
userAgent
Array.isArray(userAgent) ? userAgent[0] : userAgent
);
}
}
Expand Down