Implement non-utf8 string to bytes in OTLP exporters.#3991
Implement non-utf8 string to bytes in OTLP exporters.#3991owent wants to merge 4 commits intoopen-telemetry:mainfrom
Conversation
2299799 to
f66b834
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates the OTLP exporter attribute/value population logic to validate UTF-8 for string-like inputs and encode non-UTF8 values as bytes_value, aligning exporter output with the latest spec’s non-UTF8 mapping guidance. It also introduces a protobuf-sourced UTF-8 validation dependency to support this conversion in both CMake and Bazel builds.
Changes:
- Add UTF-8 validation in
OtlpPopulateAttributeUtilsand convert invalid string inputs tobytes_value. - Remove the
allow_bytesparameter fromPopulateAnyValue/PopulateAttributeand update all call sites accordingly. - Add a new UTF-8 validity dependency wiring for OTLP (CMake + Bazel) and update OTLP log recordable tests.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| exporters/otlp/src/otlp_populate_attribute_utils.cc | Adds UTF-8 validation and maps invalid strings to bytes_value; updates array-string handling similarly. |
| exporters/otlp/include/opentelemetry/exporters/otlp/otlp_populate_attribute_utils.h | Updates public helper signatures to remove allow_bytes. |
| exporters/otlp/src/otlp_log_recordable.cc | Updates log body/attribute population calls to new helper signatures. |
| exporters/otlp/src/otlp_recordable.cc | Updates span attribute/event/link attribute population calls to new helper signatures. |
| exporters/otlp/src/otlp_metric_utils.cc | Updates metric point attribute population calls to new helper signatures. |
| exporters/otlp/test/otlp_log_recordable_test.cc | Updates tests to assert non-UTF8 string_view becomes bytes_value. |
| exporters/otlp/CMakeLists.txt | Adds protobuf-sourced utf8_range/validity library integration and links OTLP recordable against it. |
| exporters/otlp/BUILD | Adds a Bazel helper library to bring in protobuf’s utf8_range/utf8_validity deps and wires it into otlp_recordable. |
Comments suppressed due to low confidence (1)
exporters/otlp/CMakeLists.txt:149
opentelemetry_otlp_recordablenow links against${OPENTELEMETRY_OTLP_DEP_UTF8_VALIDITY}, but the generatedotlp_recordablepkg-config module doesn’t list a correspondingRequires:entry. This can break pkg-config consumers at link time (missing-lopentelemetry_otlp_utf8_validitywhen the bundled target is used). Consider conditionally addingopentelemetry_otlp_utf8_validityto theopentelemetry_add_pkgconfig(otlp_recordable, ...)requires list when that target is built.
opentelemetry_add_pkgconfig(
otlp_recordable
"OpenTelemetry OTLP - Recordable"
"OTLP recordable implementation for spans, metrics, and logs."
"opentelemetry_trace opentelemetry_metrics opentelemetry_logs opentelemetry_resources"
)
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3991 +/- ##
==========================================
+ Coverage 90.32% 90.33% +0.02%
==========================================
Files 230 230
Lines 7299 7299
==========================================
+ Hits 6592 6593 +1
+ Misses 707 706 -1 🚀 New features to boost your workflow:
|
644af12 to
4ba4995
Compare
4ba4995 to
5de988c
Compare
dbarker
left a comment
There was a problem hiding this comment.
Looks good. Thanks for the feature. Please see a few minor cleanup questions below.
| # 2. Find an installed Protobuf package | ||
| # 3. Use FetchContent to fetch and build Protobuf from GitHub | ||
|
|
||
| if(DEFINED gRPC_PROVIDER AND NOT gRPC_PROVIDER STREQUAL "find_package" AND TARGET libprotobuf) |
There was a problem hiding this comment.
Most of the changes appear to be new formatting only. Can these format only changes be easily reverted?
| PUBLIC opentelemetry_otlp_recordable) | ||
|
|
||
| if(OPENTELEMETRY_INSTALL) | ||
| if(TARGET opentelemetry_otlp_utf8_validity) |
There was a problem hiding this comment.
Is this separate target and otlp_utf8_validity package still needed?
marcalff
left a comment
There was a problem hiding this comment.
Thanks for the PR.
This PR actually implements the specification, which should be a good thing in general.
In this particular case, and this is an opinion, the spec is a bad idea.
Please consider to add an option to opt-out and not add this extra UTF-8 validation, for applications that are instrumented correctly (strings are string, bytes sequences are binary), which do not need the overhead.
| proto_value->set_string_value(nostd::get<const char *>(value)); | ||
| const char *str_value = nostd::get<const char *>(value); | ||
| #if defined(ENABLE_OTLP_UTF8_VALIDITY) | ||
| if (utf8_range::IsStructurallyValid(str_value)) |
There was a problem hiding this comment.
Is there any way to opt-out and NOT perform UTF-8 validation on every string ?
My concern is with performance overhead.
Symbol ENABLE_OTLP_UTF8_VALIDITY is defined if the proper protobuf library is found, but this is not something a end user can control with a CMake option like WITH_OTLP_UTF8_VALIDITY.
| } | ||
| else | ||
| { | ||
| proto_value->set_bytes_value(str_value, strlen(str_value)); |
There was a problem hiding this comment.
Here, strlen means the instrumented application can send binary in a string, as long as the binary does not contains a nul character ...
A properly instrumented application will need to provide a sequence of bytes instead of a string anyway, to work properly.
But this change (which faithfully implement the specs) makes every application slower, due to the added UTF-8 validation, to try to accommodate poorly instrumented applications using the wrong types.
| if(TARGET utf8_range::utf8_validity) | ||
| target_link_libraries( | ||
| opentelemetry_otlp_recordable | ||
| PUBLIC opentelemetry_logs opentelemetry_metrics utf8_range::utf8_validity) |
There was a problem hiding this comment.
Link to utf8_range::utf8_validity with PRIVATE should be sufficient?
ThomsonTan
left a comment
There was a problem hiding this comment.
Consider add a CHANGELOG entry for this feature work.
Fixes ##3491
Changes
For significant contributions please make sure you have completed the following items:
CHANGELOG.mdupdated for non-trivial changes