`ct`: evict from batch cache during reconciliation by WillemKauf · Pull Request #29598 · redpanda-data/redpanda

WillemKauf · 2026-02-12T22:11:09Z

We preferentially serve reads from L1 once reconciled. There is no point to keeping cached data around for L0.

Backports Required

Release Notes

none

Evicts up to a given offset. This will be helpful for evicting up to the LRO during reconciliation.

Copilot

Pull request overview

This PR ensures that after a reconciliation round advances the LRO (last reconciled offset), any cached L0 batches at or below that offset are evicted to prevent stale cached reads from violating compacted-topic semantics.

Changes:

Add evict_up_to support to the underlying storage::batch_cache_index and wire it through the cloud-topics batch cache and data-plane API.
Have the reconciler invalidate (evict) the L0 batch cache up to the committed LRO after successfully persisting the LRO.
Add unit tests covering prefix eviction behavior and reconciler-triggered cache invalidation.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/v/storage/batch_cache.h	Adds `batch_cache_index::evict_up_to` API.
src/v/storage/batch_cache.cc	Implements prefix eviction on the storage batch-cache index.
src/v/storage/tests/batch_cache_test.cc	Adds tests for `batch_cache_index::evict_up_to` edge cases.
src/v/cloud_topics/batch_cache/batch_cache.h	Exposes `evict_up_to` on the cloud-topics batch cache.
src/v/cloud_topics/batch_cache/batch_cache.cc	Implements cloud-topics cache eviction by delegating to the per-partition index.
src/v/cloud_topics/batch_cache/tests/batch_cache_test.cc	Adds tests ensuring prefix eviction works per tidp and handles no-op cases.
src/v/cloud_topics/data_plane_api.h	Extends the data-plane API with `cache_evict`.
src/v/cloud_topics/data_plane_impl.cc	Implements `cache_evict` by evicting from the shard-local cloud batch cache.
src/v/cloud_topics/frontend/tests/frontend_test.cc	Updates the data-plane mock to include `cache_evict`.
src/v/cloud_topics/reconciler/reconciliation_source.h	Extends reconciliation source interface with `invalidate_cache`.
src/v/cloud_topics/reconciler/reconciliation_source.cc	Implements `invalidate_cache` for `l0_source` via `data_plane_api::cache_evict`.
src/v/cloud_topics/reconciler/reconciler.cc	Calls `invalidate_cache(lro)` after successfully bumping LRO.
src/v/cloud_topics/reconciler/tests/test_utils.h	Updates fake source to record cache invalidation calls.
src/v/cloud_topics/reconciler/tests/reconciler_test.cc	Adds tests asserting cache invalidation happens on success and not on LRO update failure.

Copilot · 2026-02-12T22:16:14Z

src/v/storage/batch_cache.h

    void truncate(model::offset offset);

+    /**
+     * Evicts all entries with base_offset <= the given offset.


The docstring for evict_up_to implies only entries with base_offset <= offset are evicted, but the implementation evicts whole cache ranges. When multiple batches share a range (small-batch coalescing), evicting one entry’s range can also evict batches with base_offset > offset. Please clarify the contract in the comment (and/or rename) so callers don’t rely on exact per-entry semantics.

Suggested change

* Evicts all entries with base_offset <= the given offset.

* Evicts all cache entries whose base offset is less than or equal to the

* given offset. Eviction is performed on underlying cache ranges rather

* than individual batches, so when multiple batches share a cache range

* (for example due to small-batch coalescing), this may also evict some

* batches with base_offset > offset. Callers must not rely on precise

* per-entry eviction semantics.

Copilot · 2026-02-12T22:16:15Z

src/v/cloud_topics/batch_cache/batch_cache.h

    std::optional<model::record_batch>
    get(const model::topic_id_partition& tidp, model::offset o);

+    // Evict all cached entries for the partition with base_offset <= o.


batch_cache::evict_up_to is documented as evicting entries with base_offset <= o, but under the hood it delegates to storage::batch_cache_index::evict_up_to, which evicts whole ranges. If multiple batches were coalesced into the same range, this can evict additional batches with base_offset > o. Please adjust the comment to reflect the range-granularity behavior so callers have an accurate contract.

Suggested change

// Evict all cached entries for the partition with base_offset <= o.

// Evict cached entries for the partition up to offset `o`. Eviction is

// performed at the granularity of coalesced index ranges, so if multiple

// batches share a range, this may evict some batches with base_offset > o.

Copilot · 2026-02-12T22:16:15Z

src/v/cloud_topics/data_plane_api.h

    virtual std::optional<model::record_batch>
    cache_get(const model::topic_id_partition&, model::offset o) = 0;
+
+    /// Evict cached entries with base_offset <= up_to


cache_evict is documented as evicting entries with base_offset <= up_to, but the backing cache evicts at range granularity (small batches may be coalesced into the same range). In practice this can evict entries with base_offset > up_to as collateral. Please clarify the contract here as well so API users understand the semantics.

Suggested change

/// Evict cached entries with base_offset <= up_to

/// Evict cached entries whose base_offset is less than or equal to

/// `up_to`. Implementations typically evict at range granularity, and

/// multiple batches may share the same cached range. As a result, this

/// call may also evict entries with base_offset > `up_to` that fall into

/// an evicted range; callers must not rely on such entries remaining

/// cached after this call.

rockwotj

Ideally we do this for retention too, but that is less likely. Also it would be nice to only do this if the topic was compacted.

WillemKauf · 2026-02-13T04:26:48Z

we discussed this in person and realized two things:

the compaction race mentioned in the cover letter and commit message doesn't seem possible because we prefer to read from L1
we should still evict data from batch caches regardless of retention policy because of the above fact (if we are serving reads from L1 once reconciled there's no point to having it cached in L0)

We preferentially serve reads from L1 once reconciled. There is no point to keeping cached data around for L0.

WillemKauf · 2026-02-13T19:42:20Z

Force push to:

Update outdated commit message.

vbotbuildovich · 2026-02-13T21:54:49Z

CI test results

test results on build#80522

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
QuotaManagementUpgradeTest	test_upgrade	null	integration	https://buildkite.com/redpanda/redpanda/builds/80522#019c589c-3ae2-4bd0-84c1-199331c88941	FLAKY	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0020, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=QuotaManagementUpgradeTest&test_method=test_upgrade
WriteCachingFailureInjectionE2ETest	test_crash_all	{"use_transactions": false}	integration	https://buildkite.com/redpanda/redpanda/builds/80522#019c58a0-aed3-4119-bdc2-aca2d1b95533	FLAKY	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.1074, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.2889, p1=0.0331, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all

rockwotj · 2026-02-16T19:51:49Z

we should still evict data from batch caches regardless of retention policy because of the above fact (if we are serving reads from L1 once reconciled there's no point to having it cached in L0)

We could have races here right where we read start a reader, the reconciler finishes and then we would go read from cloud for this data? I'd prefer some kind of delay or something, but not opposed to freeing the cache.

WillemKauf · 2026-02-16T23:00:42Z

We could have races here right where we read start a reader, the reconciler finishes and then we would go read from cloud for this data? I'd prefer some kind of delay or something, but not opposed to freeing the cache.

True. that would necessitate some sort of range locking or introspection of current readers, right? I am also not opposed to avoiding this case, I wonder if there is a relatively simple solution though.

rockwotj · 2026-02-16T23:03:13Z

We could have races here right where we read start a reader, the reconciler finishes and then we would go read from cloud for this data? I'd prefer some kind of delay or something, but not opposed to freeing the cache.

True. that would necessitate some sort of range locking or introspection of current readers, right? I am also not opposed to avoiding this case, I wonder if there is a relatively simple solution though.

Pragmatically there could just be some time delay. Why do we want to evict in this way? Is there some measurable advantage to doing the eviction?

WillemKauf added 3 commits February 12, 2026 17:10

storage: add batch_cache::evict_up_to()

ece84d0

Evicts up to a given offset. This will be helpful for evicting up to the LRO during reconciliation.

ct: add batch_cache::evict_up_to()

9c3b7ef

ct: add data_plane impl::cache_evict()

8bdd877

WillemKauf requested review from Copilot, rockwotj and wdberkeley February 12, 2026 22:11

github-actions bot added the area/redpanda label Feb 12, 2026

Copilot started reviewing on behalf of WillemKauf February 12, 2026 22:11 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

WillemKauf closed this Feb 12, 2026

WillemKauf reopened this Feb 12, 2026

WillemKauf requested review from Lazin and andrwng February 12, 2026 22:18

Lazin approved these changes Feb 13, 2026

View reviewed changes

rockwotj reviewed Feb 13, 2026

View reviewed changes

ct: invalidate cache in L0 source after reconciling

a4cb72a

We preferentially serve reads from L1 once reconciled. There is no point to keeping cached data around for L0.

WillemKauf force-pushed the l0_batch_cache_invalidate branch from b798b0a to a4cb72a Compare February 13, 2026 19:42

WillemKauf requested a review from rockwotj February 14, 2026 00:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ct`: evict from batch cache during reconciliation#29598

`ct`: evict from batch cache during reconciliation#29598
WillemKauf wants to merge 4 commits intoredpanda-data:devfrom
WillemKauf:l0_batch_cache_invalidate

WillemKauf commented Feb 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

rockwotj left a comment

Uh oh!

WillemKauf commented Feb 13, 2026

Uh oh!

WillemKauf commented Feb 13, 2026

Uh oh!

vbotbuildovich commented Feb 13, 2026

Uh oh!

rockwotj commented Feb 16, 2026

Uh oh!

WillemKauf commented Feb 16, 2026

Uh oh!

rockwotj commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-     * Evicts all entries with base_offset <= the given offset.
+     * Evicts all cache entries whose base offset is less than or equal to the
+     * given offset. Eviction is performed on underlying cache ranges rather
+     * than individual batches, so when multiple batches share a cache range
+     * (for example due to small-batch coalescing), this may also evict some
+     * batches with base_offset > offset. Callers must not rely on precise
+     * per-entry eviction semantics.

-    // Evict all cached entries for the partition with base_offset <= o.
+    // Evict cached entries for the partition up to offset `o`. Eviction is
+    // performed at the granularity of coalesced index ranges, so if multiple
+    // batches share a range, this may evict some batches with base_offset > o.

-    /// Evict cached entries with base_offset <= up_to
+    /// Evict cached entries whose base_offset is less than or equal to
+    /// `up_to`. Implementations typically evict at range granularity, and
+    /// multiple batches may share the same cached range. As a result, this
+    /// call may also evict entries with base_offset > `up_to` that fall into
+    /// an evicted range; callers must not rely on such entries remaining
+    /// cached after this call.

Conversation

WillemKauf commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backports Required

Release Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

rockwotj left a comment

Choose a reason for hiding this comment

Uh oh!

WillemKauf commented Feb 13, 2026

Uh oh!

WillemKauf commented Feb 13, 2026

Uh oh!

vbotbuildovich commented Feb 13, 2026

CI test results

Uh oh!

rockwotj commented Feb 16, 2026

Uh oh!

WillemKauf commented Feb 16, 2026

Uh oh!

rockwotj commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

WillemKauf commented Feb 12, 2026 •

edited

Loading