Skip to content

Implement non-utf8 string to bytes in OTLP exporters.#3991

Open
owent wants to merge 4 commits intoopen-telemetry:mainfrom
owent:validate_utf8_string_in_otlp
Open

Implement non-utf8 string to bytes in OTLP exporters.#3991
owent wants to merge 4 commits intoopen-telemetry:mainfrom
owent:validate_utf8_string_in_otlp

Conversation

@owent
Copy link
Copy Markdown
Member

@owent owent commented Apr 12, 2026

Fixes ##3491

Changes

  • Mapping non UTF8 string to bytes in OLTP exporters.

https://github.com/open-telemetry/opentelemetry-specification/blame/v1.55.0/specification/common/attribute-type-mapping.md.
Reimplement #3512 to meet the latest spec.

For significant contributions please make sure you have completed the following items:

  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

Copilot AI review requested due to automatic review settings April 12, 2026 15:55
@owent owent requested a review from a team as a code owner April 12, 2026 15:55
@owent owent force-pushed the validate_utf8_string_in_otlp branch from 2299799 to f66b834 Compare April 12, 2026 15:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the OTLP exporter attribute/value population logic to validate UTF-8 for string-like inputs and encode non-UTF8 values as bytes_value, aligning exporter output with the latest spec’s non-UTF8 mapping guidance. It also introduces a protobuf-sourced UTF-8 validation dependency to support this conversion in both CMake and Bazel builds.

Changes:

  • Add UTF-8 validation in OtlpPopulateAttributeUtils and convert invalid string inputs to bytes_value.
  • Remove the allow_bytes parameter from PopulateAnyValue/PopulateAttribute and update all call sites accordingly.
  • Add a new UTF-8 validity dependency wiring for OTLP (CMake + Bazel) and update OTLP log recordable tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
exporters/otlp/src/otlp_populate_attribute_utils.cc Adds UTF-8 validation and maps invalid strings to bytes_value; updates array-string handling similarly.
exporters/otlp/include/opentelemetry/exporters/otlp/otlp_populate_attribute_utils.h Updates public helper signatures to remove allow_bytes.
exporters/otlp/src/otlp_log_recordable.cc Updates log body/attribute population calls to new helper signatures.
exporters/otlp/src/otlp_recordable.cc Updates span attribute/event/link attribute population calls to new helper signatures.
exporters/otlp/src/otlp_metric_utils.cc Updates metric point attribute population calls to new helper signatures.
exporters/otlp/test/otlp_log_recordable_test.cc Updates tests to assert non-UTF8 string_view becomes bytes_value.
exporters/otlp/CMakeLists.txt Adds protobuf-sourced utf8_range/validity library integration and links OTLP recordable against it.
exporters/otlp/BUILD Adds a Bazel helper library to bring in protobuf’s utf8_range/utf8_validity deps and wires it into otlp_recordable.
Comments suppressed due to low confidence (1)

exporters/otlp/CMakeLists.txt:149

  • opentelemetry_otlp_recordable now links against ${OPENTELEMETRY_OTLP_DEP_UTF8_VALIDITY}, but the generated otlp_recordable pkg-config module doesn’t list a corresponding Requires: entry. This can break pkg-config consumers at link time (missing -lopentelemetry_otlp_utf8_validity when the bundled target is used). Consider conditionally adding opentelemetry_otlp_utf8_validity to the opentelemetry_add_pkgconfig(otlp_recordable, ...) requires list when that target is built.
  opentelemetry_add_pkgconfig(
    otlp_recordable
    "OpenTelemetry OTLP - Recordable"
    "OTLP recordable implementation for spans, metrics, and logs."
    "opentelemetry_trace opentelemetry_metrics opentelemetry_logs opentelemetry_resources"
  )

Comment thread exporters/otlp/CMakeLists.txt Outdated
Comment thread exporters/otlp/CMakeLists.txt Outdated
Comment thread exporters/otlp/src/otlp_populate_attribute_utils.cc Outdated
Comment thread exporters/otlp/src/otlp_log_recordable.cc Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.33%. Comparing base (264d972) to head (abd6f54).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3991      +/-   ##
==========================================
+ Coverage   90.32%   90.33%   +0.02%     
==========================================
  Files         230      230              
  Lines        7299     7299              
==========================================
+ Hits         6592     6593       +1     
+ Misses        707      706       -1     

see 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@dbarker dbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feature @owent! Please see my comments below.

Comment thread exporters/otlp/src/otlp_populate_attribute_utils.cc Outdated
Comment thread exporters/otlp/src/otlp_populate_attribute_utils.cc Outdated
Comment thread exporters/otlp/src/otlp_populate_attribute_utils.cc Outdated
Comment thread exporters/otlp/CMakeLists.txt Outdated
Comment thread exporters/otlp/src/otlp_populate_attribute_utils.cc Outdated
Comment thread exporters/otlp/src/otlp_populate_attribute_utils.cc Outdated
Comment thread exporters/otlp/src/otlp_populate_attribute_utils.cc Outdated
@owent owent changed the title Implement non-utf8 string to bytes in OTLP exporters. [WIP] Implement non-utf8 string to bytes in OTLP exporters. Apr 18, 2026
@owent owent force-pushed the validate_utf8_string_in_otlp branch from 644af12 to 4ba4995 Compare April 18, 2026 13:32
@owent owent force-pushed the validate_utf8_string_in_otlp branch from 4ba4995 to 5de988c Compare April 18, 2026 13:52
@owent owent changed the title [WIP] Implement non-utf8 string to bytes in OTLP exporters. Implement non-utf8 string to bytes in OTLP exporters. Apr 19, 2026
Copy link
Copy Markdown
Member

@dbarker dbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for the feature. Please see a few minor cleanup questions below.

Comment thread cmake/protobuf.cmake
# 2. Find an installed Protobuf package
# 3. Use FetchContent to fetch and build Protobuf from GitHub

if(DEFINED gRPC_PROVIDER AND NOT gRPC_PROVIDER STREQUAL "find_package" AND TARGET libprotobuf)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the changes appear to be new formatting only. Can these format only changes be easily reverted?

PUBLIC opentelemetry_otlp_recordable)

if(OPENTELEMETRY_INSTALL)
if(TARGET opentelemetry_otlp_utf8_validity)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this separate target and otlp_utf8_validity package still needed?

Copy link
Copy Markdown
Member

@marcalff marcalff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

This PR actually implements the specification, which should be a good thing in general.

In this particular case, and this is an opinion, the spec is a bad idea.

Please consider to add an option to opt-out and not add this extra UTF-8 validation, for applications that are instrumented correctly (strings are string, bytes sequences are binary), which do not need the overhead.

proto_value->set_string_value(nostd::get<const char *>(value));
const char *str_value = nostd::get<const char *>(value);
#if defined(ENABLE_OTLP_UTF8_VALIDITY)
if (utf8_range::IsStructurallyValid(str_value))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to opt-out and NOT perform UTF-8 validation on every string ?

My concern is with performance overhead.

Symbol ENABLE_OTLP_UTF8_VALIDITY is defined if the proper protobuf library is found, but this is not something a end user can control with a CMake option like WITH_OTLP_UTF8_VALIDITY.

}
else
{
proto_value->set_bytes_value(str_value, strlen(str_value));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, strlen means the instrumented application can send binary in a string, as long as the binary does not contains a nul character ...

A properly instrumented application will need to provide a sequence of bytes instead of a string anyway, to work properly.

But this change (which faithfully implement the specs) makes every application slower, due to the added UTF-8 validation, to try to accommodate poorly instrumented applications using the wrong types.

if(TARGET utf8_range::utf8_validity)
target_link_libraries(
opentelemetry_otlp_recordable
PUBLIC opentelemetry_logs opentelemetry_metrics utf8_range::utf8_validity)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to utf8_range::utf8_validity with PRIVATE should be sufficient?

Copy link
Copy Markdown
Contributor

@ThomsonTan ThomsonTan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider add a CHANGELOG entry for this feature work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants