ct: evict from batch cache during reconciliation#29598
ct: evict from batch cache during reconciliation#29598WillemKauf wants to merge 4 commits intoredpanda-data:devfrom
ct: evict from batch cache during reconciliation#29598Conversation
Evicts up to a given offset. This will be helpful for evicting up to the LRO during reconciliation.
There was a problem hiding this comment.
Pull request overview
This PR ensures that after a reconciliation round advances the LRO (last reconciled offset), any cached L0 batches at or below that offset are evicted to prevent stale cached reads from violating compacted-topic semantics.
Changes:
- Add
evict_up_tosupport to the underlyingstorage::batch_cache_indexand wire it through the cloud-topics batch cache and data-plane API. - Have the reconciler invalidate (evict) the L0 batch cache up to the committed LRO after successfully persisting the LRO.
- Add unit tests covering prefix eviction behavior and reconciler-triggered cache invalidation.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/v/storage/batch_cache.h | Adds batch_cache_index::evict_up_to API. |
| src/v/storage/batch_cache.cc | Implements prefix eviction on the storage batch-cache index. |
| src/v/storage/tests/batch_cache_test.cc | Adds tests for batch_cache_index::evict_up_to edge cases. |
| src/v/cloud_topics/batch_cache/batch_cache.h | Exposes evict_up_to on the cloud-topics batch cache. |
| src/v/cloud_topics/batch_cache/batch_cache.cc | Implements cloud-topics cache eviction by delegating to the per-partition index. |
| src/v/cloud_topics/batch_cache/tests/batch_cache_test.cc | Adds tests ensuring prefix eviction works per tidp and handles no-op cases. |
| src/v/cloud_topics/data_plane_api.h | Extends the data-plane API with cache_evict. |
| src/v/cloud_topics/data_plane_impl.cc | Implements cache_evict by evicting from the shard-local cloud batch cache. |
| src/v/cloud_topics/frontend/tests/frontend_test.cc | Updates the data-plane mock to include cache_evict. |
| src/v/cloud_topics/reconciler/reconciliation_source.h | Extends reconciliation source interface with invalidate_cache. |
| src/v/cloud_topics/reconciler/reconciliation_source.cc | Implements invalidate_cache for l0_source via data_plane_api::cache_evict. |
| src/v/cloud_topics/reconciler/reconciler.cc | Calls invalidate_cache(lro) after successfully bumping LRO. |
| src/v/cloud_topics/reconciler/tests/test_utils.h | Updates fake source to record cache invalidation calls. |
| src/v/cloud_topics/reconciler/tests/reconciler_test.cc | Adds tests asserting cache invalidation happens on success and not on LRO update failure. |
| void truncate(model::offset offset); | ||
|
|
||
| /** | ||
| * Evicts all entries with base_offset <= the given offset. |
There was a problem hiding this comment.
The docstring for evict_up_to implies only entries with base_offset <= offset are evicted, but the implementation evicts whole cache ranges. When multiple batches share a range (small-batch coalescing), evicting one entry’s range can also evict batches with base_offset > offset. Please clarify the contract in the comment (and/or rename) so callers don’t rely on exact per-entry semantics.
| * Evicts all entries with base_offset <= the given offset. | |
| * Evicts all cache entries whose base offset is less than or equal to the | |
| * given offset. Eviction is performed on underlying cache ranges rather | |
| * than individual batches, so when multiple batches share a cache range | |
| * (for example due to small-batch coalescing), this may also evict some | |
| * batches with base_offset > offset. Callers must not rely on precise | |
| * per-entry eviction semantics. |
| std::optional<model::record_batch> | ||
| get(const model::topic_id_partition& tidp, model::offset o); | ||
|
|
||
| // Evict all cached entries for the partition with base_offset <= o. |
There was a problem hiding this comment.
batch_cache::evict_up_to is documented as evicting entries with base_offset <= o, but under the hood it delegates to storage::batch_cache_index::evict_up_to, which evicts whole ranges. If multiple batches were coalesced into the same range, this can evict additional batches with base_offset > o. Please adjust the comment to reflect the range-granularity behavior so callers have an accurate contract.
| // Evict all cached entries for the partition with base_offset <= o. | |
| // Evict cached entries for the partition up to offset `o`. Eviction is | |
| // performed at the granularity of coalesced index ranges, so if multiple | |
| // batches share a range, this may evict some batches with base_offset > o. |
| virtual std::optional<model::record_batch> | ||
| cache_get(const model::topic_id_partition&, model::offset o) = 0; | ||
|
|
||
| /// Evict cached entries with base_offset <= up_to |
There was a problem hiding this comment.
cache_evict is documented as evicting entries with base_offset <= up_to, but the backing cache evicts at range granularity (small batches may be coalesced into the same range). In practice this can evict entries with base_offset > up_to as collateral. Please clarify the contract here as well so API users understand the semantics.
| /// Evict cached entries with base_offset <= up_to | |
| /// Evict cached entries whose base_offset is less than or equal to | |
| /// `up_to`. Implementations typically evict at range granularity, and | |
| /// multiple batches may share the same cached range. As a result, this | |
| /// call may also evict entries with base_offset > `up_to` that fall into | |
| /// an evicted range; callers must not rely on such entries remaining | |
| /// cached after this call. |
rockwotj
left a comment
There was a problem hiding this comment.
Ideally we do this for retention too, but that is less likely. Also it would be nice to only do this if the topic was compacted.
|
we discussed this in person and realized two things:
|
We preferentially serve reads from L1 once reconciled. There is no point to keeping cached data around for L0.
b798b0a to
a4cb72a
Compare
|
Force push to:
|
CI test resultstest results on build#80522 |
We could have races here right where we read start a reader, the reconciler finishes and then we would go read from cloud for this data? I'd prefer some kind of delay or something, but not opposed to freeing the cache. |
True. that would necessitate some sort of range locking or introspection of current readers, right? I am also not opposed to avoiding this case, I wonder if there is a relatively simple solution though. |
Pragmatically there could just be some time delay. Why do we want to evict in this way? Is there some measurable advantage to doing the eviction? |
We preferentially serve reads from L1 once reconciled. There is no point to keeping cached data around for L0.
Backports Required
Release Notes