Skip to content

Datadog output plugin malforming the JSON payload when dd_tags is defined. #10113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
KayOS65 opened this issue Mar 21, 2025 · 7 comments
Open
Labels
bug waiting-for-release This has been fixed/merged but it's waiting to be included in a release.

Comments

@KayOS65
Copy link

KayOS65 commented Mar 21, 2025

Bug Report

Describe the bug

Messages forwarded from a Kubernetes cluster to Datadog via the Datadog output plugin with dd_tags specified result in a malformed JSON payload. The Datadog tags are not applied to the messages forwarded from the Datadog output, and Datadog receives otherwise unidentified messages with a body of the key (ddtags) and another with the body of the value (env:stage,region:au).

Datadog Live Tail:

Image

Example JSON payload - "ddtags" and "env:stage,region:au" are not part of the message object:

[
    {
        "timestamp": 1742518068503,
        "ddsource": "my-cluster",
        "service": "aws.eks",
        "hostname": "my-cluster",
        "time": "2025-03-21T00:47:48.50289474Z",
        "stream": "stderr",
        "_p": "F",
        "message": "[2025/03/21 00:47:48] [error] [engine] chunk '1-1742518058.406111356.flb' cannot be retried: task_id=9, input=tail.0 > output=datadog.1",
        "kubernetes": {
            "pod_name": "fluent-bit-26fvn",
            "namespace_name": "fluent-bit",
            "pod_id": "2642c452-c020-4664-b344-799a656e181c",
            "labels": {
                "app.kubernetes.io/instance": "fluent-bit",
                "app.kubernetes.io/name": "fluent-bit",
                "controller-revision-hash": "5f948c84f9",
                "pod-template-generation": "18"
            },
            "annotations": {
                "checksum/config": "e705e5cecd9c068adf98af8fb483ee28b463a0678dc60502ed39af271210b113",
                "kubectl.kubernetes.io/restartedAt": "2025-03-21T09:49:08+10:00"
            },
            "host": "ip-10-155-5-194.ap-southeast-2.compute.internal",
            "container_name": "fluent-bit",
            "docker_id": "8257505dfdd80f6c300b4726b6bf28b8875e56c19694af85217bd9b150b27f40",
            "container_image": "cr.fluentbit.io/fluent/fluent-bit:3.2.8"
        }
    },
    "ddtags",
    "env:stage,region:au",
    {
        "timestamp": 1742518068503,
        "ddsource": "my-cluster",
        "service": "aws.eks",
        "hostname": "my-cluster",
        "time": "2025-03-21T00:47:48.503027657Z",
        "stream": "stderr",
        "_p": "F",
        "message": "[2025/03/21 00:47:48] [error] [engine] chunk '1-1742518057.276699898.flb' cannot be retried: task_id=8, input=tail.0 > output=datadog.1",
        "kubernetes": {
            "pod_name": "fluent-bit-26fvn",
            "namespace_name": "fluent-bit",
            "pod_id": "2642c452-c020-4664-b344-799a656e181c",
            "labels": {
                "app.kubernetes.io/instance": "fluent-bit",
                "app.kubernetes.io/name": "fluent-bit",
                "controller-revision-hash": "5f948c84f9",
                "pod-template-generation": "18"
            },
            "annotations": {
                "checksum/config": "e705e5cecd9c068adf98af8fb483ee28b463a0678dc60502ed39af271210b113",
                "kubectl.kubernetes.io/restartedAt": "2025-03-21T09:49:08+10:00"
            },
            "host": "ip-10-155-5-194.ap-southeast-2.compute.internal",
            "container_name": "fluent-bit",
            "docker_id": "8257505dfdd80f6c300b4726b6bf28b8875e56c19694af85217bd9b150b27f40",
            "container_image": "cr.fluentbit.io/fluent/fluent-bit:3.2.8"
        }
    },
    "ddtags",
    "env:stage,region:au"
]

To Reproduce
Helm install fluent-bit:

  • helm upgrade --install fluent-bit -n fluent-bit --create-namespace fluent/fluent-bit

Apply the following fluent-bit.conf:

[SERVICE]
    Daemon Off
    Flush 1
    Log_Level info
    Parsers_File /fluent-bit/etc/parsers.conf
    Parsers_File /fluent-bit/etc/conf/custom_parsers.conf
    HTTP_Server On
    HTTP_Listen 0.0.0.0
    HTTP_Port 2020
    Health_Check On

[INPUT]
    Name tail
    Path /var/log/containers/*.log
    multiline.parser docker, cri
    Tag kube.*
    Mem_Buf_Limit 5MB
    Skip_Long_Lines On

[INPUT]
    Name systemd
    Tag host.*
    Systemd_Filter _SYSTEMD_UNIT=kubelet.service
    Read_From_Tail On

[FILTER]
    Name             kubernetes
    Match            kube.*
    Kube_URL         https://kubernetes.default.svc:443
    Kube_CA_File     /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    Kube_Token_File  /var/run/secrets/kubernetes.io/serviceaccount/token
    Kube_Tag_Prefix  kube.var.log.containers.
    Merge_Log        On
    Merge_Log_Key    log_processed
    
[OUTPUT]
    Name        datadog
    Match       kube.*
    Host        http-intake.logs.datadoghq.com
    TLS         on
    compress    gzip
    apikey      <redacted>
    dd_service  aws.eks
    dd_source   my-cluster
    dd_tags     env:stage,region:au
    dd_hostname my-cluster

I updated the kubernetes filter and added the datadog output in the K8s ConfigMap after installation. Or the config could be applied at installation time via Helm values.

Expected behavior
The log messages are forwarded to Datadog with the ddtags attribute set to the specified key:value pairs.

Example JSON payload - ddtags is a key in the message object and set to the value env:stage,region:au :

[
    {
        "timestamp": 1742518068503,
        "ddsource": "my-cluster",
        "ddtags": "env:stage,region:au",
        "service": "aws.eks",
        "hostname": "my-cluster",
        "time": "2025-03-21T00:47:48.503027657Z",
        "stream": "stderr",
        "_p": "F",
        "message": "[2025/03/21 00:47:48] [error] [engine] chunk '1-1742518057.276699898.flb' cannot be retried: task_id=8, input=tail.0 > output=datadog.1",
        "kubernetes": {
            "pod_name": "fluent-bit-26fvn",
            "namespace_name": "fluent-bit",
            "pod_id": "2642c452-c020-4664-b344-799a656e181c",
            "labels": {
                "app.kubernetes.io/instance": "fluent-bit",
                "app.kubernetes.io/name": "fluent-bit",
                "controller-revision-hash": "5f948c84f9",
                "pod-template-generation": "18"
            },
            "annotations": {
                "checksum/config": "e705e5cecd9c068adf98af8fb483ee28b463a0678dc60502ed39af271210b113",
                "kubectl.kubernetes.io/restartedAt": "2025-03-21T09:49:08+10:00"
            },
            "host": "ip-10-155-5-194.ap-southeast-2.compute.internal",
            "container_name": "fluent-bit",
            "docker_id": "8257505dfdd80f6c300b4726b6bf28b8875e56c19694af85217bd9b150b27f40",
            "container_image": "cr.fluentbit.io/fluent/fluent-bit:3.2.8"
        }
    }
]

Your Environment

  • Version used:
Fluent Bit v3.2.8
Git commit: d13e8e4ab2029fa92600b7d1d0da28f8dcc350eb
  • Kubernetes v1.31
  • Amazon Linux 2 (AWS EKS Managed Node Group AMIs)
  • Filters and plugins: Datadog output plugin, Kubernetes filter

Additional context
This issue is preventing us from fully on-boarding our K8s clusters into Datadog, compromising our ability to properly monitor the workloads running on the cluster.

@patrick-stephens
Copy link
Contributor

I updated the kubernetes filter and added the datadog output in the K8s ConfigMap after installation.

Do you mean you manually changed the config map? You will need to trigger a hot reload or restart FB pods to pick it up. If you did it via helm then it will auto-roll the pods as it uses a checksum annotation.

@KayOS65
Copy link
Author

KayOS65 commented Mar 22, 2025

I updated the kubernetes filter and added the datadog output in the K8s ConfigMap after installation.

Do you mean you manually changed the config map? You will need to trigger a hot reload or restart FB pods to pick it up. If you did it via helm then it will auto-roll the pods as it uses a checksum annotation.

I do a kubectl rollout restart after a config update. I am sure that the FB Pods are using the config posted above.

@mwdd146980
Copy link

Hi all, thanks for reporting this! We looked into this and found that there's a bug with how config option dd_hostname gets handled in the datadog fluent bit plugin. We’re working on fixing it with this PR: #10104. We don't have a timeline for a fix yet, but you can subscribe to the PR to be notified when it get merged.

In the meantime, if you want the logs to ingest properly, removing the dd_hostname config option will stop the logs from becoming malformed. If you need a workaround, you can set a hostname tag with config option dd_tags or append a hostname with a record modifier. Keep in mind, though, that the record modifier will add the hostname to all logs that Fluent Bit processes.

@patrick-stephens
Copy link
Contributor

Thanks for picking up @mwdd146980

@mwdd146980
Copy link

mwdd146980 commented Mar 31, 2025

Hi all, thanks for your patience on this. The fix for this has been merged and is scheduled for version 4.0. We are also working on backporting the fix to version 3.2.

@KayOS65
Copy link
Author

KayOS65 commented Apr 1, 2025

Thanks @patrick-stephens and @mwdd146980!

@patrick-stephens patrick-stephens added the waiting-for-release This has been fixed/merged but it's waiting to be included in a release. label Apr 1, 2025
@mwdd146980
Copy link

Update: fix has gone live on 4.0.0 Release Fluent Bit 4.0.0 · fluent/fluent-bit, backport to 3.2 still pending

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug waiting-for-release This has been fixed/merged but it's waiting to be included in a release.
Projects
None yet
Development

No branches or pull requests

3 participants