Skip to content

Writable warm replica replication/recovery #17390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 7, 2025

Conversation

skumawat2025
Copy link
Contributor

@skumawat2025 skumawat2025 commented Feb 19, 2025

Description

This PR aims to implement the Replica Replication and Recovery flow for indices with partial store locality (also known as writable warm indices), which were introduced with the composite directory in PR #12782. It builds upon and addresses comments received on the pending PR #14670.
For writable indices, where primary shards now consist of both localDirectory and remoteDirectory, we've implemented file uploads to remote storage and maintain an LRU fileCache on the node instead of storing all files locally.
We've made several changes to the replication and recovery processes:

  1. During replication events on replicas, we now only update the NRTReplicationReaderManager with the latest CheckpointInfoResponse, avoiding unnecessary downloads of actual file diffs on replica.
  2. For index close, we are skipping doing a flush on replica shards as during re-open we will anyway sync from remote store.
  3. During remote store sync in recovery scenarios, we are skipping coping the actually segment files from remote storage to local and create a new commit with the latest commit info from remote segment metadata file.

Related Issues

Resolves #13647

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Roadmap:Search Project-wide roadmap label Storage:Remote v2.16.0 Issues and PRs related to version 2.16.0 labels Feb 19, 2025
@skumawat2025 skumawat2025 changed the title Writable warm replica relocation/recovery Writable warm replica replication/recovery Feb 19, 2025
Copy link
Contributor

❌ Gradle check result for 6fde641: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@skumawat2025 skumawat2025 force-pushed the warm-replica-backup branch 2 times, most recently from 719a67d to 1963b58 Compare February 24, 2025 04:14
Copy link
Contributor

❌ Gradle check result for 1963b58: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 3ce8970: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@skumawat2025
Copy link
Contributor Author

❌ Gradle check result for 1963b58: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Failing tests are flaky: #17364, #15806, #16145

Copy link
Contributor

github-actions bot commented Mar 6, 2025

❌ Gradle check result for cf1b64b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Mar 6, 2025

❌ Gradle check result for d920d71: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@skumawat2025 skumawat2025 force-pushed the warm-replica-backup branch from d920d71 to 17b43da Compare March 6, 2025 11:05
Signed-off-by: Sandeep Kumawat <skumwt@amazon.com>
Copy link
Contributor

github-actions bot commented Mar 6, 2025

❌ Gradle check result for 17b43da: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sandeep Kumawat <skumwt@amazon.com>
@skumawat2025 skumawat2025 force-pushed the warm-replica-backup branch from 17b43da to b216afc Compare March 6, 2025 15:23
Copy link
Contributor

github-actions bot commented Mar 6, 2025

❕ Gradle check result for b216afc: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Mar 6, 2025

Codecov Report

Attention: Patch coverage is 36.17021% with 30 lines in your changes missing coverage. Please review.

Project coverage is 72.41%. Comparing base (82bbdfb) to head (b216afc).
Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
...earch/index/store/remote/utils/cache/LRUCache.java 0.00% 10 Missing ⚠️
...org/opensearch/index/store/CompositeDirectory.java 65.00% 4 Missing and 3 partials ⚠️
...index/store/remote/utils/cache/SegmentedCache.java 0.00% 4 Missing ⚠️
.../opensearch/index/engine/NRTReplicationEngine.java 25.00% 1 Missing and 2 partials ⚠️
...ing/allocation/allocator/RemoteShardsBalancer.java 0.00% 2 Missing ⚠️
...in/java/org/opensearch/index/shard/IndexShard.java 33.33% 1 Missing and 1 partial ⚠️
.../indices/replication/SegmentReplicationTarget.java 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17390      +/-   ##
============================================
- Coverage     72.59%   72.41%   -0.18%     
+ Complexity    65798    65727      -71     
============================================
  Files          5311     5311              
  Lines        304888   304924      +36     
  Branches      44212    44225      +13     
============================================
- Hits         221323   220819     -504     
- Misses        65501    66021     +520     
- Partials      18064    18084      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-project-automation github-project-automation bot moved this to 👀 In review in Storage Project Board Mar 7, 2025
@gbbafna gbbafna merged commit cb869c0 into opensearch-project:main Mar 7, 2025
30 of 32 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Storage Project Board Mar 7, 2025
@skumawat2025 skumawat2025 added the backport 2.x Backport to 2.x branch label Mar 7, 2025
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 7, 2025
Signed-off-by: Sandeep Kumawat <skumwt@amazon.com>
(cherry picked from commit cb869c0)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@skumawat2025 skumawat2025 removed the backport 2.x Backport to 2.x branch label Mar 7, 2025
vinaykpud pushed a commit to vinaykpud/OpenSearch that referenced this pull request Mar 18, 2025
Signed-off-by: Sandeep Kumawat <skumwt@amazon.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Roadmap:Search Project-wide roadmap label skip-changelog Storage:Remote v2.16.0 Issues and PRs related to version 2.16.0
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

[Writable Warm] Recovery/Replication flow for Composite Directory
4 participants