Skip to content

[P1] event-exporter: bounded retries with backoff for Cloud Logging writes #1031

@erain

Description

@erain

Problem

writer.Write retries indefinitely with a fixed 10s delay. During API outages or throttling, this can stall throughput and cause backlog growth without a clear signal or drop policy.

Proposed optimization

Introduce bounded retry with exponential backoff + jitter, and expose metrics for retry count and dropped batches. Optionally apply per-status handling (e.g., retry 429/5xx, fail fast on 400). This makes failure behavior predictable and prevents infinite backlog under sustained failure.

Notes / references

Acceptance criteria

  • Retries use exponential backoff with jitter and a max retry time or count.
  • Metrics expose retry attempts and dropped batches.
  • Behavior for 400 vs 429/5xx is explicit and documented.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions