Skip to content

Skip search shards with INDEX_REFRESH_BLOCK #129132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

benchaplin
Copy link
Contributor

@benchaplin benchaplin commented Jun 9, 2025

#117543 introduced a ClusterBlock which is applied to new indices in Serverless which do not yet have search shards up. We should skip searches for indices with this block in order to avoid meaningless 503s.


Edit: I've changed approaches to this "skipping" logic and wanted to document some reasoning:

I considered two options:

  1. Hold the same search shard iterators, but mark them as "skipped"
  2. Skip the index before resolving search shard iterators

Option 1 would result in a search response with {"_shards": { "skipped": = the number of shards we skipped due to an index refresh block. Option 2 pretends those shards don't exist, so {"_shards": { "skipped": 0, "total": < total in option 1.

I've decided to go with option 2. The reason is that this cluster block is only active when the index was just created and is still in a red state. I could not think of a reason that a user would need to be aware of the number of skipped shards in this case (a user curious about this detail would likely be able to deduce what's going on anyway). Furthermore, it simplifies the code change (see 86c9a5d).

@benchaplin benchaplin requested a review from tlrx June 9, 2025 05:56
@benchaplin benchaplin added >non-issue Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations v9.1.0 labels Jun 9, 2025
@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Jun 9, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@benchaplin benchaplin requested a review from a team as a code owner June 9, 2025 14:39
@benchaplin benchaplin force-pushed the skip_search_shards_with_index_block branch from 998cdd5 to 1ecc447 Compare June 9, 2025 17:05
@benchaplin benchaplin removed the request for review from a team June 9, 2025 17:06
@benchaplin benchaplin requested review from a team as code owners June 11, 2025 21:03
@benchaplin benchaplin force-pushed the skip_search_shards_with_index_block branch from e233cc7 to 17706e2 Compare June 11, 2025 21:04
Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. The search part should be reviewed by the ES Search team.

Comment on lines 34 to 35
var addIndexBlockRequest = new AddIndexBlockRequest(IndexMetadata.APIBlock.REFRESH, "test");
client().execute(TransportAddIndexBlockAction.TYPE, addIndexBlockRequest).actionGet();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refresh block should be added automatically to newly created indices as long as they have replicas and the "use refresh block" setting is enabled in the node setting. We should remove the ability to add the refresh block through the Add Index Block API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look @tlrx!

I was hoping to test this change outside of the context of Serverless. But I agree it's not appropriate to add the refresh block to that API for testing purposes only, so I will see if I can construct the scenario in some other way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I was able to get the setup I was looking for by adding the block directly to cluster state in the tests.

assertHitCount(prepareSearch().setQuery(QueryBuilders.matchAllQuery()), 0);
}

public void testSearchMultipleIndicesEachWithAnIndexRefreshBlock() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be folded into a single test, where one or more indices are randomly created, most of some with replicas but other without replicas, and then allocate zero or more search shards and check the expected results, finally assigning all search shards and check the results again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've folded this into a single test with some additional randomization. My goal is to keep the integration tests in the Serverless PR, so I'll add the test scenario you're proposing there.

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass on the search related side of things and left a few questions and comments.

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benchaplin thanks for the last changes, I left one more small comment but nothing that should block this PR from my end. I didn't take a closer look at the tests since @tlrx seems to have made a few suggestions there, LGTM from my end though.

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I know nothing about search shard iterators :)

ClusterService clusterService = internalCluster().getInstance(ClusterService.class, dataNode.getName());
ClusterState currentState = clusterService.state();
ClusterState newState = ClusterState.builder(currentState).blocks(blocksBuilder).build();
setState(clusterService, newState);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not intended to be used in integration test as it overrides the current data node cluster state.

For testing the INDEX_REFRESH_BLOCK I think it makes sense to only have unit tests in stateful elasticsearch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @tlrx, can you explain more the risks of doing this?

I like having fine-grained control over blocks so I can write tests that block some indices and allow others in one search - I think this is critical to test. If I can only 'set' the block by controlling active search nodes (like I do in the other PR), I can't think of a way to achieve what I want.

@@ -234,6 +237,10 @@ private Map<String, OriginalIndices> buildPerIndexOriginalIndices(
for (String index : indices) {
if (hasBlocks) {
blocks.indexBlockedRaiseException(projectId, ClusterBlockLevel.READ, index);
if (blocks.hasIndexBlock(projectState.projectId(), index, IndexMetadata.INDEX_REFRESH_BLOCK)) {
res.put(index, SKIPPED_INDICES);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we are doing the filtering in the right place. It is a bit counter intuitive that we would resolve the shards given the indices and the skip some of them. Can we not filter the indices to start with? Maybe that removes the need to use that SKIPPED_INDICES marker thing too :)

Do we need to check for this block only in the search API?

By the way, something probably needs to happen in ES|QL too around this (does not need to be in this PR, but I am raising the need to track that).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjusted my approach based on our discussion (documented in the PR description). Now I ignore blocked indices before resolving shards.

Do we need to check for this block only in the search API?

I've added a test to prove that _msearch hits this code path. Are you asking about other 'search-flavored' APIs?

I believe this will work for ES|QL, as EsqlSearchShardsAction calls TransportSearchShardAction which calls TransportSearchAction#getLocalShardsIterator. But thanks for bringing that up, I'll make sure there's a follow up task created to test this in ES|QL and make any additional changes necessary.

import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertHitCount;
import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertResponse;

public class SearchWithIndexBlocksIT extends ESIntegTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to have ESQL test for this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tracking that to be a followup task.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a small question, I like how small the change has become!

@@ -1305,7 +1305,7 @@ private void executeSearch(

Map<String, Float> concreteIndexBoosts = resolveIndexBoosts(searchRequest, projectState.cluster());

adjustSearchType(searchRequest, shardIterators.size() == 1);
adjustSearchType(searchRequest, shardIterators.size() <= 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change required? Is it necessary to adjust the search type where there's no shards to query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good question. I just tested it - looks like no. I had made the change to avoid a DfsQueryPhase error I was getting during the previous approach.

I think it's a meaningless change now, if there are no shards the search will end in AbstractSearchAsyncAction#start. I've removed it.

@benchaplin benchaplin removed request for a team July 18, 2025 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search Foundations/Search Catch all for Search Foundations serverless-linked Added by automation, don't add manually Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants