From 22cadcb5e8a1639d07fb2007e9b0ded7fe31e377 Mon Sep 17 00:00:00 2001 From: Michael Davis Date: Fri, 16 May 2025 16:11:08 -0500 Subject: [PATCH] Reset consecutive failure count on range success Currently, the count of consecutive failures is only reset when a batch is fully completed. This can be problematic if the sizes of batches are imbalances and some batches are very large, where a single batch can contain many ranges that need to be processed. Normally when batches encounter transient failures, the count is reset when the batch is eventually processed and the backfill moves on - if, however, for a large enough batch, enough transient failures are encountered for ranges within that batch, the entire backfill will fail. This change resets the count of consecutive failures when a range within a batch is processed so backfills do not get stuck when this occurs. --- .../cash/backfila/service/runner/statemachine/BatchAwaiter.kt | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/service/src/main/kotlin/app/cash/backfila/service/runner/statemachine/BatchAwaiter.kt b/service/src/main/kotlin/app/cash/backfila/service/runner/statemachine/BatchAwaiter.kt index be0d46433..f7515801e 100644 --- a/service/src/main/kotlin/app/cash/backfila/service/runner/statemachine/BatchAwaiter.kt +++ b/service/src/main/kotlin/app/cash/backfila/service/runner/statemachine/BatchAwaiter.kt @@ -75,6 +75,10 @@ class BatchAwaiter( } if (response.remaining_batch_range != null) { + // A successfully processed range within a batch counts as a successful RPC for the + // purposes of resetting the count of consecutive failures. + backfillRunner.onRpcSuccess() + // We have a remaining_batch_range, continue the batch. remainingBatch = initialBatch.newBuilder() .batch_range(response.remaining_batch_range)