Skip to content

Conversation

@tharropoulos
Copy link
Contributor

Change Summary

The process_tokens() function was using collection-level symbols_to_index and token_separators during sub-tokenization (line 4627), ignoring field-level configuration.

  • Pass most_weighted_field_symbols_to_index and most_weighted_field_token_separators to process_tokens()
  • Apply ternary logic inside process_tokens() to use field-level config or fall back to collection defaults
  • Ensure consistent tokenization behavior across the entire query parsing pipeline

PR Checklist

- pass most_weighted_field_symbols_to_index and token_separators to process_tokens
- apply ternary logic in process_tokens to fall back to collection-level defaults
- ensure sub-tokenization uses field-specific configuration consistently
kishorenc pushed a commit that referenced this pull request Oct 23, 2025
… port of #2624) (#2625)

* fix: respect field-level symbols_to_index during query sub-tokenization

- pass most_weighted_field_symbols_to_index and token_separators to process_tokens
- apply ternary logic in process_tokens to fall back to collection-level defaults
- ensure sub-tokenization uses field-specific configuration consistently

* test: test sub-tokenization with custom symbols
@kishorenc kishorenc merged commit f987dc2 into typesense:v30 Oct 23, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants