Skip to content

Conversation

notmikedavis
Copy link
Contributor

@notmikedavis notmikedavis commented May 16, 2025

Currently, the count of consecutive failures is only reset when a batch is fully completed. This can be problematic if the sizes of batches are imbalanced and some batches are very large, where a single batch can contain many ranges that need to be processed. Normally when batches encounter transient failures, the count is reset when the batch is eventually processed and the backfill moves on - if, however, for a large enough batch, enough transient failures are encountered for ranges within that batch, the entire backfill will fail. This change resets the count of consecutive failures when a range within a batch is processed so backfills do not get stuck when this occurs.

Currently, the count of consecutive failures is only reset when a batch is fully completed. This can be problematic if the sizes of batches are imbalances and some batches are very large, where a single batch can contain many ranges that need to be processed. Normally when batches encounter transient failures, the count is reset when the batch is eventually processed and the backfill moves on - if, however, for a large enough batch, enough transient failures are encountered for ranges within that batch, the entire backfill will fail. This change resets the count of consecutive failures when a range within a batch is processed so backfills do not get stuck when this occurs.
@notmikedavis notmikedavis marked this pull request as ready for review May 16, 2025 21:29
@mpawliszyn mpawliszyn merged commit d6a27e6 into cashapp:master May 16, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants