Skip to content

Feature: Implement N-gram bloom filter index to improve the performance of LIKE queries #17724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
b41sh opened this issue Apr 8, 2025 · 0 comments
Assignees
Labels
C-feature Category: feature

Comments

@b41sh
Copy link
Member

b41sh commented Apr 8, 2025

Summary

Currently, LIKE queries in Databend often result in full table scans, especially when the pattern includes leading wildcards (e.g., LIKE '%keyword') or complex regular expressions. This can lead to unacceptable query latencies, especially on large datasets.

N-gram bloom index offers a powerful solution to this problem by pre-processing and indexing substrings (N-grams) of the text data. This allows the query engine to quickly identify potential matches based on the indexed N-grams, drastically reducing the number of rows that need to be scanned.

Benefits of N-gram bloom index:

  • Significant Performance Improvement for LIKE Queries: Dramatically reduces query execution time for LIKE queries, especially those with leading wildcards or complex patterns.
  • Reduced Resource Consumption: By minimizing full table scans, N-gram bloom index reduces CPU and I/O usage, leading to more efficient resource utilization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature Category: feature
Projects
None yet
Development

No branches or pull requests

2 participants