fix: respect field-level symbols during query sub-tokenization #2624

tharropoulos · 2025-10-17T15:20:12Z

Change Summary

The process_tokens() function was using collection-level symbols_to_index and token_separators during sub-tokenization (line 4627), ignoring field-level configuration.

Pass most_weighted_field_symbols_to_index and most_weighted_field_token_separators to process_tokens()
Apply ternary logic inside process_tokens() to use field-level config or fall back to collection defaults
Ensure consistent tokenization behavior across the entire query parsing pipeline

PR Checklist

I have read and signed the Contributor License Agreement.

- pass most_weighted_field_symbols_to_index and token_separators to process_tokens - apply ternary logic in process_tokens to fall back to collection-level defaults - ensure sub-tokenization uses field-specific configuration consistently

… port of #2624) (#2625) * fix: respect field-level symbols_to_index during query sub-tokenization - pass most_weighted_field_symbols_to_index and token_separators to process_tokens - apply ternary logic in process_tokens to fall back to collection-level defaults - ensure sub-tokenization uses field-specific configuration consistently * test: test sub-tokenization with custom symbols

tharropoulos added 2 commits October 17, 2025 18:11

test: test sub-tokenization with custom symbols

f353bf1

tharropoulos mentioned this pull request Oct 17, 2025

fix: respect field-level symbols during query sub-tokenization (temp port of #2624) #2625

Merged

1 task

kishorenc approved these changes Oct 23, 2025

View reviewed changes

kishorenc merged commit f987dc2 into typesense:v30 Oct 23, 2025
2 checks passed

kishorenc added the release:v30 label Oct 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

fix: respect field-level symbols during query sub-tokenization #2624

fix: respect field-level symbols during query sub-tokenization #2624

Uh oh!

tharropoulos commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

fix: respect field-level symbols during query sub-tokenization #2624

fix: respect field-level symbols during query sub-tokenization #2624

Uh oh!

Conversation

tharropoulos commented Oct 17, 2025

Change Summary

PR Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants