Skip to content

Fix index lookup when field-caps returns empty mapping #132138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

smalyshev
Copy link
Contributor

@smalyshev smalyshev commented Jul 29, 2025

Field-caps resolution can return empty resolution if it receives empty mapping. This can happen when using KEEP, for example:

FROM logs-*| LOOKUP JOIN values_lookup ON v | KEEP v

when values_lookup does not have the field v. In this case, field-caps will return the empty mapping, since there are no fields that match the requested field set, and this code in IndexResolver will convert it to empty resolution:

 boolean allEmpty = true;
        for (FieldCapabilitiesIndexResponse ir : fieldCapsResponse.getIndexResponses()) {
            allEmpty &= ir.get().isEmpty();
        }
        // If all the mappings are empty we return an empty set of resolved indices to line up with QL
        var index = new EsIndex(indexPattern, rootFields, allEmpty ? Map.of() : concreteIndices, partiallyUnmappedFields);

which breaks the response processing code for lookup indices.

Closes #132105

@elasticsearchmachine
Copy link
Collaborator

Hi @smalyshev, I've created a changelog YAML for you.

@smalyshev smalyshev marked this pull request as ready for review July 29, 2025 23:01
@smalyshev smalyshev requested a review from nik9000 July 29, 2025 23:01
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 29, 2025
@smalyshev smalyshev requested a review from alex-spies July 29, 2025 23:01
() -> runQuery("FROM logs-*,c*:logs-* | LOOKUP JOIN values_lookup ON v | KEEP v", randomBoolean())
);
// FIXME: strictly speaking this message is not correct, as the index is available, but the field is not
assertThat(ex.getMessage(), containsString("lookup index [values_lookup] is not available"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be tricky to fix?
It would be nice to avoid misleading the user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat tricky, as when we get empty resolution, we don't really know if that's because the index is missing or the fields are missing. I could try and see if there's a way around it...

@@ -374,6 +374,34 @@ public void testLookupJoinMissingKey() throws IOException {
);
assertThat(ex.getMessage(), containsString("Unknown column [local_tag] in right side of join"));

// Add KEEP clause to try and trick the field-caps result parser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain why this is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because when KEEP is present, field-caps call only fetches the restricted set of fields, and the code here:

boolean allEmpty = true;
        for (FieldCapabilitiesIndexResponse ir : fieldCapsResponse.getIndexResponses()) {
            allEmpty &= ir.get().isEmpty();
        }
        // If all the mappings are empty we return an empty set of resolved indices to line up with QL
        var index = new EsIndex(indexPattern, rootFields, allEmpty ? Map.of() : concreteIndices, partiallyUnmappedFields);

in IndexResolver can produce empty mapping. Which hasn't been handled properly by the code because I assumed this could not happen.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something needs to be fixed, but I don't think it's this teams' responsibility.

The bug fix here is an workaround for this bug #107031.

This specific issue with empty lookup index is reproduceable when the mapping of that index is {}.
If the mapping of the lookup index is still "empty", but "empty" is expressed as

{
    "mappings": {
        "properties" : {
            
        }
    }
}

then the bug is not reproduceable. And the output of field_caps shouldn't be empty no matter what (see the bug report above).

@javanna @piergm we hit again the bug report above from more than an year ago, this time with lookup join functionality in ES|QL. Is there anything stopping that great PR from @piergm to be merged?

@smalyshev
Copy link
Contributor Author

@astefan I am not sure I understand. The code that converts empty mappings into empty resolution is in IndexResolver - see above - and the return of empty mappings from field-caps is legitimate. Field caps does not return empty result - it returns empty mapping, but IndexResolver code here:

boolean allEmpty = true;
        for (FieldCapabilitiesIndexResponse ir : fieldCapsResponse.getIndexResponses()) {
            allEmpty &= ir.get().isEmpty();
        }
        // If all the mappings are empty we return an empty set of resolved indices to line up with QL
        var index = new EsIndex(indexPattern, rootFields, allEmpty ? Map.of() : concreteIndices, partiallyUnmappedFields);

converts it to empty resolution. I'm not sure how #107031 is relevant here, could you explain?

@astefan
Copy link
Contributor

astefan commented Jul 30, 2025

@smalyshev sure. Happy to provide more context.

If I create an index with no fields like this: PUT /test {"mappings":{"properties" :{}}} and run field_caps on it POST /test/_field_caps?fields=* I get something back from field_caps (see below).

And this something is actually useful to ESQL. I know for sure we use _index and it's likely (but not sure) that _index_mode is also useful.

{
    "indices": [
        "test"
    ],
    "fields": {
        "_routing": {
            "_routing": {
                "type": "_routing",
                "metadata_field": true,
                "searchable": true,
                "aggregatable": false
            }
        },
        "_inference_fields": {
            "_inference_fields": {
                "type": "_inference_fields",
                "metadata_field": true,
                "searchable": false,
                "aggregatable": false
            }
        },
        "_doc_count": {
            "integer": {
                "type": "integer",
                "metadata_field": true,
                "searchable": false,
                "aggregatable": false
            }
        },
        "_ignored_source": {
            "_ignored_source": {
                "type": "_ignored_source",
                "metadata_field": true,
                "searchable": false,
                "aggregatable": false
            }
        },
        "_index": {
            "_index": {
                "type": "_index",
                "metadata_field": true,
                "searchable": true,
                "aggregatable": true
            }
        },
        "_feature": {
            "_feature": {
                "type": "_feature",
                "metadata_field": true,
                "searchable": false,
                "aggregatable": false
            }
        },
        "_index_mode": {
            "_index_mode": {
                "type": "_index_mode",
                "metadata_field": true,
                "searchable": true,
                "aggregatable": true
            }
        },
        "_tier": {
            "keyword": {
                "type": "keyword",
                "metadata_field": true,
                "searchable": true,
                "aggregatable": true
            }
        },
        "_ignored": {
            "_ignored": {
                "type": "_ignored",
                "metadata_field": true,
                "searchable": true,
                "aggregatable": true
            }
        },
        "_seq_no": {
            "_seq_no": {
                "type": "_seq_no",
                "metadata_field": true,
                "searchable": true,
                "aggregatable": true
            }
        },
        "_nested_path": {
            "_nested_path": {
                "type": "_nested_path",
                "metadata_field": true,
                "searchable": true,
                "aggregatable": false
            }
        },
        "_field_names": {
            "_field_names": {
                "type": "_field_names",
                "metadata_field": true,
                "searchable": true,
                "aggregatable": false
            }
        },
        "_data_stream_timestamp": {
            "_data_stream_timestamp": {
                "type": "_data_stream_timestamp",
                "metadata_field": true,
                "searchable": false,
                "aggregatable": false
            }
        },
        "_source": {
            "_source": {
                "type": "_source",
                "metadata_field": true,
                "searchable": false,
                "aggregatable": false
            }
        },
        "_id": {
            "_id": {
                "type": "_id",
                "metadata_field": true,
                "searchable": true,
                "aggregatable": false
            }
        },
        "_version": {
            "_version": {
                "type": "_version",
                "metadata_field": true,
                "searchable": false,
                "aggregatable": true
            }
        }
    }
}

But if I create an index like PUT /test {} (I simply omit the mappings section) and run field_caps on it POST /test/_field_caps?fields=* I get nothing back from field_caps (see below).

{
    "indices": [
        "test"
    ],
    "fields": {}
}

Why is not ES providing _index and _index_mode and all other metadata fields? This is metadata associated with any ES index. If the index exists, those metadata fields should exist as well, they are not controlled by the mappings section that I omitted, no?

@smalyshev
Copy link
Contributor Author

@astefan but getting nothing from field-caps is not unusual. If I use KEEP, fields would not be * in the field-caps request and then field-caps will still return {}. This is a correct response - I did not ask for _index etc. so I am not receiving them. But when IndexResolver processes it it converts "no mappings" response into "no indices" which is confusing lookup join resolution.

@astefan
Copy link
Contributor

astefan commented Jul 30, 2025

@astefan but getting nothing from field-caps is not unusual. If I use KEEP, fields would not be * in the field-caps request and then field-caps will still return {}

Now I am not sure of what the actual scenario is that this PR is trying to fix. I tried to read between the lines in comments, PR description and tests what is the failing scenario and the way you created the text index matters.

For the record, the second test scenario above that I mentioned reproduce the issue this PR is trying to fix, but the first one doesn't with a query like from test | lookup join empty-index. So, I am looking forward for more test scenarios so I can better understand and maybe validate/invalidate my previous statement about this being just another issue caused by #107031. Thank you!

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've updated my research and tests with the updated reproducing data from the description of the PR and I believe the fix in this PR is ok.

For completeness sake, please add three more tests to this PR:

  • one that uses a values_lookup index that is empty created like this PUT /values_lookup with request body as {}
  • one that uses a values_lookup index that is empty created like this PUT /values_lookup with request body as {"mappings":{"properties" :{}},"settings":{"index.mode":"lookup"}}
  • one that uses a values_lookup index that is empty created like this PUT /values_lookup with request body as {"mappings":{"properties" :{}}}

@astefan astefan self-requested a review July 31, 2025 14:50
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in IndexResolver since this is likely to be a question for others looking at that code here, add a comment something like

Introduced with #46775
We need to be able to differentiate between an empty mapping index and an empty index due to fields not being found. An empty mapping index will generate no columns (important) for a query like FROM empty-mapping-index, whereas an empty result here but for fields that do not exist in the index (but the index has a mapping) will result in "VerificationException Unknown column" errors.

@smalyshev
Copy link
Contributor Author

@astefan I think I was able to fix the error message discrepancy and also added some empty index tests. Please review again. Thanks!

@elastic elastic deleted a comment from github-actions bot Jul 31, 2025
@smalyshev smalyshev removed request for a team July 31, 2025 17:57
@@ -518,6 +523,16 @@ private PreAnalysisResult receiveLookupIndexResolution(
}
return result.addLookupIndexResolution(index, lookupIndexResolution);
}

if (lookupIndexResolution.get().indexNameWithModes().isEmpty() && lookupIndexResolution.resolvedIndices().isEmpty() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which scenario is this possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same scenario we're discussing, but instead of local cluster we have remote clusters. Then the branch above will not be executed, but the code will try to validate remote cluster's indices. But since we've got the empty mappings, it would be confused into thinking there's no indices there, and report a wrong error. So this part fixes it and let's the verifier report the correct error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same scenario we're discussing

Can you please, be specific? (scenarios confusion led to long discussions and prolonging this PR merge)

In my mind there are these three scenarios plus the ones where the mapping of the lookup index is not empty but the query uses keep v. Plus those involving CCS.
Note: Every scenario above should be tested with local only indices (no CCS) as well, btw (can be done in a follow up PR or opened an issue for it to be addressed later).

Copy link
Contributor Author

@smalyshev smalyshev Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the scenario where we are calling field-caps to resolve lookup join, with field list set to some restricted list (not "*") and the lookup index does not have any of those fields, so we're receiving the field-caps response with an empty map. Example:

FROM remote:logs-*| LOOKUP JOIN values_lookup ON v | KEEP v

this is the same query as in the issue description, just with the remote cluster, not the local one. The difference exists because local-only and remote scenarios are handled by different code branches, due to skip_unavailable complexities etc., so the initial fix only covered the local-only branch and led to the fact that the remote scenario produced wrong error message - "unknown index" instead of "field missing".

testLookupJoinEmptyIndex tests the three empty index scenarios you've described (as much as I could reproduce them from transport level test - it doesn't really do JSON REST APIs there) and it tests them in both local and remote scenarios.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: Error resolving lookup indices
4 participants