Skip to content

ct: evict from batch cache during reconciliation#29598

Open
WillemKauf wants to merge 4 commits intoredpanda-data:devfrom
WillemKauf:l0_batch_cache_invalidate
Open

ct: evict from batch cache during reconciliation#29598
WillemKauf wants to merge 4 commits intoredpanda-data:devfrom
WillemKauf:l0_batch_cache_invalidate

Conversation

@WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Feb 12, 2026

We preferentially serve reads from L1 once reconciled. There is no point to keeping cached data around for L0.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

  • none

Evicts up to a given offset.

This will be helpful for evicting up to the LRO during reconciliation.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ensures that after a reconciliation round advances the LRO (last reconciled offset), any cached L0 batches at or below that offset are evicted to prevent stale cached reads from violating compacted-topic semantics.

Changes:

  • Add evict_up_to support to the underlying storage::batch_cache_index and wire it through the cloud-topics batch cache and data-plane API.
  • Have the reconciler invalidate (evict) the L0 batch cache up to the committed LRO after successfully persisting the LRO.
  • Add unit tests covering prefix eviction behavior and reconciler-triggered cache invalidation.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/v/storage/batch_cache.h Adds batch_cache_index::evict_up_to API.
src/v/storage/batch_cache.cc Implements prefix eviction on the storage batch-cache index.
src/v/storage/tests/batch_cache_test.cc Adds tests for batch_cache_index::evict_up_to edge cases.
src/v/cloud_topics/batch_cache/batch_cache.h Exposes evict_up_to on the cloud-topics batch cache.
src/v/cloud_topics/batch_cache/batch_cache.cc Implements cloud-topics cache eviction by delegating to the per-partition index.
src/v/cloud_topics/batch_cache/tests/batch_cache_test.cc Adds tests ensuring prefix eviction works per tidp and handles no-op cases.
src/v/cloud_topics/data_plane_api.h Extends the data-plane API with cache_evict.
src/v/cloud_topics/data_plane_impl.cc Implements cache_evict by evicting from the shard-local cloud batch cache.
src/v/cloud_topics/frontend/tests/frontend_test.cc Updates the data-plane mock to include cache_evict.
src/v/cloud_topics/reconciler/reconciliation_source.h Extends reconciliation source interface with invalidate_cache.
src/v/cloud_topics/reconciler/reconciliation_source.cc Implements invalidate_cache for l0_source via data_plane_api::cache_evict.
src/v/cloud_topics/reconciler/reconciler.cc Calls invalidate_cache(lro) after successfully bumping LRO.
src/v/cloud_topics/reconciler/tests/test_utils.h Updates fake source to record cache invalidation calls.
src/v/cloud_topics/reconciler/tests/reconciler_test.cc Adds tests asserting cache invalidation happens on success and not on LRO update failure.

void truncate(model::offset offset);

/**
* Evicts all entries with base_offset <= the given offset.
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for evict_up_to implies only entries with base_offset <= offset are evicted, but the implementation evicts whole cache ranges. When multiple batches share a range (small-batch coalescing), evicting one entry’s range can also evict batches with base_offset > offset. Please clarify the contract in the comment (and/or rename) so callers don’t rely on exact per-entry semantics.

Suggested change
* Evicts all entries with base_offset <= the given offset.
* Evicts all cache entries whose base offset is less than or equal to the
* given offset. Eviction is performed on underlying cache ranges rather
* than individual batches, so when multiple batches share a cache range
* (for example due to small-batch coalescing), this may also evict some
* batches with base_offset > offset. Callers must not rely on precise
* per-entry eviction semantics.

Copilot uses AI. Check for mistakes.
std::optional<model::record_batch>
get(const model::topic_id_partition& tidp, model::offset o);

// Evict all cached entries for the partition with base_offset <= o.
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_cache::evict_up_to is documented as evicting entries with base_offset <= o, but under the hood it delegates to storage::batch_cache_index::evict_up_to, which evicts whole ranges. If multiple batches were coalesced into the same range, this can evict additional batches with base_offset > o. Please adjust the comment to reflect the range-granularity behavior so callers have an accurate contract.

Suggested change
// Evict all cached entries for the partition with base_offset <= o.
// Evict cached entries for the partition up to offset `o`. Eviction is
// performed at the granularity of coalesced index ranges, so if multiple
// batches share a range, this may evict some batches with base_offset > o.

Copilot uses AI. Check for mistakes.
virtual std::optional<model::record_batch>
cache_get(const model::topic_id_partition&, model::offset o) = 0;

/// Evict cached entries with base_offset <= up_to
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache_evict is documented as evicting entries with base_offset <= up_to, but the backing cache evicts at range granularity (small batches may be coalesced into the same range). In practice this can evict entries with base_offset > up_to as collateral. Please clarify the contract here as well so API users understand the semantics.

Suggested change
/// Evict cached entries with base_offset <= up_to
/// Evict cached entries whose base_offset is less than or equal to
/// `up_to`. Implementations typically evict at range granularity, and
/// multiple batches may share the same cached range. As a result, this
/// call may also evict entries with base_offset > `up_to` that fall into
/// an evicted range; callers must not rely on such entries remaining
/// cached after this call.

Copilot uses AI. Check for mistakes.
@WillemKauf WillemKauf closed this Feb 12, 2026
@WillemKauf WillemKauf reopened this Feb 12, 2026
@WillemKauf WillemKauf requested review from Lazin and andrwng February 12, 2026 22:18
Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we do this for retention too, but that is less likely. Also it would be nice to only do this if the topic was compacted.

@WillemKauf
Copy link
Contributor Author

we discussed this in person and realized two things:

  1. the compaction race mentioned in the cover letter and commit message doesn't seem possible because we prefer to read from L1
  2. we should still evict data from batch caches regardless of retention policy because of the above fact (if we are serving reads from L1 once reconciled there's no point to having it cached in L0)

We preferentially serve reads from L1 once reconciled. There is no
point to keeping cached data around for L0.
@WillemKauf WillemKauf force-pushed the l0_batch_cache_invalidate branch from b798b0a to a4cb72a Compare February 13, 2026 19:42
@WillemKauf
Copy link
Contributor Author

Force push to:

  • Update outdated commit message.

@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#80522
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
QuotaManagementUpgradeTest test_upgrade null integration https://buildkite.com/redpanda/redpanda/builds/80522#019c589c-3ae2-4bd0-84c1-199331c88941 FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0020, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=QuotaManagementUpgradeTest&test_method=test_upgrade
WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/80522#019c58a0-aed3-4119-bdc2-aca2d1b95533 FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.1074, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.2889, p1=0.0331, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all

@WillemKauf WillemKauf requested a review from rockwotj February 14, 2026 00:34
@rockwotj
Copy link
Contributor

we should still evict data from batch caches regardless of retention policy because of the above fact (if we are serving reads from L1 once reconciled there's no point to having it cached in L0)

We could have races here right where we read start a reader, the reconciler finishes and then we would go read from cloud for this data? I'd prefer some kind of delay or something, but not opposed to freeing the cache.

@WillemKauf
Copy link
Contributor Author

We could have races here right where we read start a reader, the reconciler finishes and then we would go read from cloud for this data? I'd prefer some kind of delay or something, but not opposed to freeing the cache.

True. that would necessitate some sort of range locking or introspection of current readers, right? I am also not opposed to avoiding this case, I wonder if there is a relatively simple solution though.

@rockwotj
Copy link
Contributor

We could have races here right where we read start a reader, the reconciler finishes and then we would go read from cloud for this data? I'd prefer some kind of delay or something, but not opposed to freeing the cache.

True. that would necessitate some sort of range locking or introspection of current readers, right? I am also not opposed to avoiding this case, I wonder if there is a relatively simple solution though.

Pragmatically there could just be some time delay. Why do we want to evict in this way? Is there some measurable advantage to doing the eviction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants