Skip to content

Conversation

b41sh
Copy link
Member

@b41sh b41sh commented Oct 18, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

This PR improves inverted index support for VARIANT data:

  1. Tantivy Upgrade: Upgraded the underlying Tantivy search engine to its latest version, bringing performance improvements and new features.
  2. Tantivy Search Integration: Replaced custom search functions with Tantivy's search capabilities. This change delivers better query performance, though it may result in a slightly larger index file size, we can improve performance by caching index data locally.
  3. Advanced VARIANT Field Search: Introduced the ability to perform complex searches within VARIANT internal fields, including AND, OR, IN, and Range queries. This allows for precise and flexible filtering of semi-structured data.

for example

CREATE OR REPLACE TABLE test (
  id INT NULL,
  data VARIANT NULL,
  INVERTED INDEX idx1 (data)
);

INSERT INTO test VALUES 
(1, '{"user":{"name":"Alice","age":20,"hobbies":["football","swimming"]}}'),
(2, '{"user":{"name":"Bob","age":25,"hobbies":["shopping","piano"]}}'),
(3, '{"user":{"name":"Tom","age":30,"hobbies":["travel","running"]}}');


SELECT * FROM test WHERE query('data.user.name:Bob AND data.user.age:25');
╭───────────────────────────────────────────────────────────────────────────────────╮
│        id       │                               data                              │
│ Nullable(Int32) │                        Nullable(Variant)                        │
├─────────────────┼─────────────────────────────────────────────────────────────────┤
│               2 │ {"user":{"age":25,"hobbies":["shopping","piano"],"name":"Bob"}} │
╰───────────────────────────────────────────────────────────────────────────────────╯

SELECT * FROM test WHERE query('data.user.name:Bob OR data.user.age:30');
╭───────────────────────────────────────────────────────────────────────────────────╮
│        id       │                               data                              │
│ Nullable(Int32) │                        Nullable(Variant)                        │
├─────────────────┼─────────────────────────────────────────────────────────────────┤
│               2 │ {"user":{"age":25,"hobbies":["shopping","piano"],"name":"Bob"}} │
│               3 │ {"user":{"age":30,"hobbies":["travel","running"],"name":"Tom"}} │
╰───────────────────────────────────────────────────────────────────────────────────╯

SELECT * FROM test WHERE query('data.user.hobbies: IN [football shopping]');
╭────────────────────────────────────────────────────────────────────────────────────────╮
│        id       │                                 data                                 │
│ Nullable(Int32) │                           Nullable(Variant)                          │
├─────────────────┼──────────────────────────────────────────────────────────────────────┤
│               1 │ {"user":{"age":20,"hobbies":["football","swimming"],"name":"Alice"}} │
│               2 │ {"user":{"age":25,"hobbies":["shopping","piano"],"name":"Bob"}}      │
╰────────────────────────────────────────────────────────────────────────────────────────╯

SELECT * FROM test WHERE query('data.user.age: [25 TO 30]');
╭───────────────────────────────────────────────────────────────────────────────────╮
│        id       │                               data                              │
│ Nullable(Int32) │                        Nullable(Variant)                        │
├─────────────────┼─────────────────────────────────────────────────────────────────┤
│               2 │ {"user":{"age":25,"hobbies":["shopping","piano"],"name":"Bob"}} │
│               3 │ {"user":{"age":30,"hobbies":["travel","running"],"name":"Tom"}} │
╰───────────────────────────────────────────────────────────────────────────────────╯

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Oct 18, 2025
@BohuTANG BohuTANG added the ci-cloud Build docker image for cloud test label Oct 19, 2025
Copy link
Contributor

Docker Image for PR

  • tag: pr-18861-c5331ec-1760855997

note: this image tag is only available for internal use.

@b41sh b41sh force-pushed the feat-inverted-index-json branch from b47670f to 7960533 Compare October 19, 2025 08:16
@b41sh b41sh requested review from BohuTANG and sundy-li October 19, 2025 08:42
@b41sh b41sh marked this pull request as ready for review October 19, 2025 08:43
@BohuTANG BohuTANG merged commit 4e7358c into databendlabs:main Oct 20, 2025
249 of 253 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants