Skip to content

Backfila times out and retries batches after 1 minute #137

@mightyguava

Description

@mightyguava

Conversation here: https://square.slack.com/archives/CET1PJ8NP/p1600206743294600?thread_ts=1600192471.251500&cid=CET1PJ8NP

Backfila is unexpectedly retrying immediately on connection failure rather than respecting the backoff schedule. We are also currently seeing a repeatable connection failure after 1 minute

okhttp3.internal.http2.StreamResetException: stream was reset: CANCEL

While backfill logic is largely idempotent, for large batches, a parallel batch on the same ID range can adversely performance since there'll be a nice bunch of extra lock waits.

We need to dig into what in Backfila's stack that's doing the retry on connection reset. Separately, why the reset is happening at all, though that's probably outside of backfila.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions