Skip to content

out_cloudwatch: add entity support and remove unnecessary log content #10585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 14, 2025

Conversation

zhihonl
Copy link
Contributor

@zhihonl zhihonl commented Jul 11, 2025

Background

CloudWatch introduced a new feature called Explored Related which is a UI component that helps user navigate between their telemetry(metrics,logs) and AWS resources. This is done by including a new field in PutLogEvents API which is called Entity. More documentation about this new field can be found here: https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_Entity.html

Problem

CloudWatch output plugin is what sends user logs to the backend. It needs to populate the entity field in the PutLogEvents field. As part of this effort, we need to add additional logics in CloudWatch plugin to attach appropriate entity data then make the correct PutLogEvents call.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change

Screenshot showing the entities are properly attached

image
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Added support for enriching CloudWatch log events with detailed entity metadata, including Kubernetes and AWS context.
    • Introduced a configuration option to enable or disable the addition of entity information in log submissions.
    • Enhanced log event payloads to include structured entity attributes when available.
  • Bug Fixes

    • Improved filtering to prevent redundant or sensitive entity fields from appearing in log message bodies.
  • Chores

    • Added debug logging for HTTP responses and payloads in log event submissions.

Signed-off-by: Zhihong Lin <zhiholin@amazon.com>
Copy link
Member

@edsiper edsiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Zhihong Lin <zhiholin@amazon.com>
Copy link

coderabbitai bot commented Jul 31, 2025

Walkthrough

The changes introduce an "entity" metadata feature to the CloudWatch Logs plugin. This includes new data structures for entity attributes, logic to extract and filter entity-related fields from log records, enrich outgoing CloudWatch payloads with entity information, and configuration options to control this behavior. Supporting macros and memory management routines are also added.

Changes

Cohort / File(s) Change Summary
Entity Metadata Extraction, Filtering, and Payload Enrichment
plugins/out_cloudwatch_logs/cloudwatch_api.c
Adds logic to extract entity metadata from log records, filter entity fields from payloads, and enrich outgoing CloudWatch payloads with an "entity" JSON object. Introduces multiple helper functions for entity parsing, attribute serialization, and field removal. Adds debug logging for HTTP responses.
Entity-Related Macros
plugins/out_cloudwatch_logs/cloudwatch_api.h
Adds macro definitions for Application Insights API limits and the entity field prefix. No changes to function declarations or control flow.
Plugin Initialization, Entity Memory Management, and Config
plugins/out_cloudwatch_logs/cloudwatch_logs.c
Updates initialization to detect Kubernetes metadata filters, adds entity memory cleanup, and introduces a new boolean config property add_entity to control entity enrichment.
Entity Data Structures and Plugin Context
plugins/out_cloudwatch_logs/cloudwatch_logs.h
Introduces new structs for entity, key attributes, and attributes. Updates log_stream and plugin context structs to support entity metadata and configuration flags.

Sequence Diagram(s)

sequenceDiagram
    participant LogRecord
    participant CloudWatchPlugin
    participant Entity
    participant AWSCloudWatch

    LogRecord->>CloudWatchPlugin: Submit log event
    CloudWatchPlugin->>Entity: Parse entity fields from log record
    CloudWatchPlugin->>Entity: Remove entity fields from log body
    CloudWatchPlugin->>CloudWatchPlugin: Enrich payload with entity JSON (if enabled)
    CloudWatchPlugin->>AWSCloudWatch: Send PutLogEvents(payload)
    AWSCloudWatch-->>CloudWatchPlugin: Respond with status
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

A rabbit hops through logs so bright,
With entities tucked in payloads tight.
Metadata sorted, fields swept clean,
CloudWatch now knows what they mean.
Kubernetes whispers, AWS replies—
Enriched log tales beneath the skies!
🐇✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb5a46a and f2f61a7.

📒 Files selected for processing (1)
  • plugins/out_cloudwatch_logs/cloudwatch_api.c (11 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
🔇 Additional comments (16)
plugins/out_cloudwatch_logs/cloudwatch_api.c (16)

44-44: LGTM!

The include for record accessor functionality is necessary for the new entity parsing features.


204-253: LGTM!

The function properly constructs entity key attributes JSON with appropriate error handling and null checking. The snprintf usage has been corrected from previous reviews.


255-372: LGTM!

The function correctly handles different platform types and constructs appropriate entity attributes JSON. Error handling is consistent and proper.


380-380: LGTM!

Variable declaration follows project conventions.


405-439: LGTM!

The entity integration logic is well-implemented with proper validation ensuring only complete entities (with service name and account ID) are included in the payload. Error handling and logging are appropriate.


460-460: LGTM!

Correct fix for snprintf error checking - now properly checks for negative return value.


577-649: LGTM!

The field removal functions correctly implement filtering of aws_entity prefixed fields from both nested and root level msgpack maps. The logic properly handles msgpack structure rebuilding while maintaining data integrity.


1071-1111: LGTM!

The fallback environment function properly constructs environment strings with appropriate memory management and error handling. The snprintf usage is correct with proper bounds checking.


1119-1146: LGTM!

The set_entity_field function properly manages field assignment, memory allocation, and filter counting. Memory management is handled correctly with proper cleanup of existing values.


1148-1210: LGTM!

The parse_entity function uses a clean, structured approach to field mapping and extraction. The use of record accessors follows project patterns and the fallback logic for service names is well-implemented.


1212-1248: LGTM!

The update_or_create_entity function properly handles entity lifecycle with correct memory management. The allocation failure cleanup addresses previous review concerns and prevents memory leaks.


1264-1269: LGTM!

Proper initialization of msgpack objects for filtering, with appropriate conditional logic.


1318-1343: LGTM!

The entity processing integration is well-implemented with proper conditional checks and field filtering. The msgpack manipulation correctly removes entity fields from the log message.


1465-1468: LGTM!

Proper cleanup of msgpack objects in the success path.


1476-1479: LGTM!

Proper cleanup of msgpack objects in the error path prevents memory leaks.


2035-2036: LGTM!

The additional debug logging improves observability for CloudWatch API responses, which will be helpful for troubleshooting.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
plugins/out_cloudwatch_logs/cloudwatch_api.c (1)

637-642: Add comment to clarify intentionally empty block

The empty else if block is intentional but could be clearer with a comment explaining that root-level aws_entity fields are filtered by not packing them.

             else if (root_filtered_fields > 0 &&
                      root_kv.key.type == MSGPACK_OBJECT_STR &&
                      root_kv.key.via.str.size > AWS_ENTITY_PREFIX_LEN &&
                      strncmp(root_kv.key.via.str.ptr,
                              AWS_ENTITY_PREFIX, AWS_ENTITY_PREFIX_LEN) == 0) {
+                /* Skip root-level aws_entity fields - they are filtered by not packing them */
             }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 35d423d and 9bb75a9.

📒 Files selected for processing (1)
  • plugins/out_cloudwatch_logs/cloudwatch_api.c (10 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
🔇 Additional comments (2)
plugins/out_cloudwatch_logs/cloudwatch_api.c (2)

2029-2030: Consider security implications of debug logging

The debug logs print full HTTP response data and payload which might contain sensitive information or PII. Ensure these logs don't leak sensitive data in production environments.

Consider adding a configuration option to control verbose debug logging or redact sensitive fields from the debug output.


1198-1204: Add null checks before strlen to prevent segfaults

The code calls strlen on potentially null pointers without checking. If any of these fields are NULL, it will cause a segmentation fault.

     if (entity->key_attributes->name == NULL &&
         entity->attributes->name_source == NULL &&
         entity->attributes->workload != NULL) {
-        entity->key_attributes->name = flb_strndup(entity->attributes->workload,
-                                                 strlen(entity->attributes->workload));
+        size_t workload_len = strlen(entity->attributes->workload);
+        entity->key_attributes->name = flb_strndup(entity->attributes->workload,
+                                                 workload_len);
         entity->attributes->name_source = flb_strndup("K8sWorkload", 11);
     }

However, the NULL check for workload on line 1200 already ensures it's not NULL before the strlen call, so this specific case is safe. The pattern should still be reviewed throughout the codebase for consistency.

Likely an incorrect or invalid review comment.

Signed-off-by: Zhihong Lin <zhiholin@amazon.com>
zhihonl added 2 commits July 31, 2025 14:36
Signed-off-by: Zhihong Lin <zhiholin@amazon.com>
Signed-off-by: Zhihong Lin <zhiholin@amazon.com>
Signed-off-by: Zhihong Lin <zhiholin@amazon.com>
@zhihonl
Copy link
Contributor Author

zhihonl commented Aug 4, 2025

@edsiper PR is good for review again 👀

Comment on lines +727 to +729
FLB_CONFIG_MAP_BOOL, "add_entity", "false",
0, FLB_TRUE, offsetof(struct flb_cloudwatch, add_entity),
"add entity to PutLogEvent calls"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will want to add documentation for the new parameter to the new output plugins - https://github.com/fluent/fluent-bit-docs/blob/master/pipeline/outputs/cloudwatch.md#configuration-parameters

@edsiper edsiper merged commit b6a5ef9 into fluent:master Aug 14, 2025
53 checks passed
@edsiper
Copy link
Member

edsiper commented Aug 14, 2025

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants