-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Handle structured log messages #131027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Handle structured log messages #131027
Conversation
🔍 Preview links for changed docs |
Hi @eyalkoren, I've created a changelog YAML for you. |
…into normalize-json-log-messages
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looked at the documentation, left a comment
...les/ingest-otel/src/main/java/org/elasticsearch/ingest/otel/NormalizeForStreamProcessor.java
Show resolved
Hide resolved
@flash1293 Thanks for reviewing the docs! I will correct what you've pointed out. |
I checked those, they look good, maybe we should capture the edge case discussed above via test as well - what happens if |
I can add tests for that, although this PR doesn't deal with such cases based on our decision |
Sure, it would just make sure that it behaves as we expect |
@flash1293 I applied your suggestions in 7208fce. As I promised, the new logic that handles structured message doesn't touch anything that is not However, this highlighted something: the existing normalization logic treats any type of So I am not proposing to change anything, but raising this for awareness and additional consideration. |
Cool, thanks!
Yeah, I think that's OK and expected. Elasticsearch will either cast it or if that doesn't work it will set the ignored fields, so we should have a sensible behavior every time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for handling structured log messages in the normalize_for_stream
processor by detecting JSON format in the message
field and processing it based on ECS format conventions. The implementation reuses code from the JsonProcessor
by making the ingest-otel
module dependent on the ingest-common
module.
Key changes include:
- Enhanced message field processing to detect and parse JSON structures
- ECS vs non-ECS format detection based on
@timestamp
field presence - Module dependency updates to enable code reuse from
ingest-common
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
NormalizeForStreamProcessor.java |
Added structured message detection and processing logic with ECS format handling |
NormalizeForStreamProcessorTests.java |
Added comprehensive test cases for ECS and non-ECS JSON message normalization |
20_normalize_json_message.yml |
Added integration tests for structured message processing scenarios |
IngestCommonPlugin.java |
Made plugin extensible to allow dependency from other modules |
build.gradle |
Added dependency configuration for ingest-common module |
module-info.java |
Updated module dependencies to include required logging and ingest-common |
normalize-for-stream.md |
Added documentation for structured message field processing |
131027.yaml |
Added changelog entry for the new feature |
Comments suppressed due to low confidence (5)
modules/ingest-otel/src/main/java/org/elasticsearch/ingest/otel/NormalizeForStreamProcessor.java:66
- [nitpick] The logger variable should follow Java naming conventions and be named 'logger' instead of 'log'.
private static final Logger log = LogManager.getLogger(NormalizeForStreamProcessor.class);
modules/ingest-otel/src/main/java/org/elasticsearch/ingest/otel/NormalizeForStreamProcessor.java:143
- [nitpick] Use 'logger' instead of 'log' to follow Java naming conventions.
log.warn("Failed to parse structured message, keeping it as a string in 'body.text' field: {}", e.getMessage());
modules/ingest-otel/src/main/java/org/elasticsearch/ingest/otel/NormalizeForStreamProcessor.java:124
- [nitpick] Use 'logger' instead of 'log' to follow Java naming conventions.
log.debug(
modules/ingest-otel/src/main/java/org/elasticsearch/ingest/otel/NormalizeForStreamProcessor.java:130
- [nitpick] Use 'logger' instead of 'log' to follow Java naming conventions.
log.debug(
modules/ingest-otel/src/main/java/org/elasticsearch/ingest/otel/NormalizeForStreamProcessor.java:138
- [nitpick] Use 'logger' instead of 'log' to follow Java naming conventions.
log.debug("Structured message is not a JSON object, keeping it as a string in 'body.text' field: {}", message);
...les/ingest-otel/src/main/java/org/elasticsearch/ingest/otel/NormalizeForStreamProcessor.java
Show resolved
Hide resolved
...les/ingest-otel/src/main/java/org/elasticsearch/ingest/otel/NormalizeForStreamProcessor.java
Show resolved
Hide resolved
message = message.trim(); | ||
if (message.startsWith("{") && message.endsWith("}")) { | ||
// if the message is a JSON object, we assume it is a structured log | ||
Object parsedMessage = JsonProcessor.apply(message, true, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming we already parse JSON in other pipelines but any new security concerns here as we are parsing arbitrary and possibly unsanitized inputs here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new thing here is the fact that it will be on by default to logs-*-*
data streams.
I am not familiar with the security concerns that may be related to such JSON parsing.
As for data sanitation - I guess that if sanitation is either applied on the message
field or on any other fields after the JSON parsing - this is equivalent to existing sanitation of non-structured logs.
But let me ask for input on both issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small correction - it will be on for logs,logs.*
, not logs-*-*
.
About security concerns - are you concerned about the case where an attacker could control the message
, but not the whole document, with this approach essentially giving them control over the whole document? It's a good thought, I think it's true but I don't see how it would be problematic. The whole idea of the logs stream is that it can handle arbitrary incoming data in a sensible, opinionated way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a couple of things: Is there any way for a malicious payload to "break out" and mess up the document.
I'm also wondering in general how secure the JSON parsing code is given it will receive potentially end user crafted payloads (If the operator is logging user input from their apps) but I think that's a lesser concern as I guess we already use it elsewhere and we have the whole sandboxing / entitlements system protecting ES as a whole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic looks sound and tests look good. Just a couple of quick questions around performance and security just to make sure my assumptions match reality (My assumption being that everything looks in line with how this will be used 😄)
@felixbarny @flash1293 please see @lukewhiting's comments above and provide your feedback on these issues. I was mostly concerned about the dependency on another module - not because I think of any specific issue with that, mainly because I am not aware if there are hidden implications that I am not aware of. |
Discussed this again with @LucaWintergerst and we decided to move forward with this approach - let's accept the tradeoffs and document it well. Reasoning: We feel confident that this is the better default behavior. While there is a small potential for undesired behavior, users can work around that in various ways - the convenience for many weighs stronger than that. |
@lukewhiting could you approve from your side? I think Eyal did a good job documenting the behavior in this PR already. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following in person call with Joe and Luca, I'm happy that we understand the risks of the current approach and they are good to proceed.
The change LGTM. We've shied away from parsing arbitrary JSON before because with dynamic mappings, it can easily lead to a mess. However, streams changes the equation by disabling dynamic mappings and this processor is specifically meant for streams. I also like the different parsing logic for ECS and non-ECS JSON. It's orthogonal to this but I'm thinking it would make sense to have an OTel processor for parsing ECS logs as well, inspired by what's done here. |
Closes #130333
In order to reuse code from
JsonProcessor
, I needed to make theingest-otel
module dependent on theingest-common
module. For this, I madeIngestCommonPlugin
anExtensiblePlugin
. Maybe there's a better way to enable such a dependency.