Skip to content

refactor: custom lexer #437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

refactor: custom lexer #437

wants to merge 14 commits into from

Conversation

psteinroe
Copy link
Collaborator

@psteinroe psteinroe commented Jul 1, 2025

  • adds a new tokenizer crate that turns a string into simple tokens
  • adds a new lexer + lexer_codegen that uses the tokeniser to lex into a new SyntaxKind enum

the new implementation is

  • much more performant (no extra string allocations, no call to C library)
  • works with broken strings (!!!!)
  • custom-made to our use-case (eg the LineEnding variant comes with a count)

in a follow-up, we will be able to:

  • parse custom parameters that popular tools use
  • pre-process to remove unsupported stuff
  • parse non-sql content (e.g. commands) via a simple custom parser

todos:

  • use new lexer in splitter
  • make sure we support all the different parameter formats popular tools use -> will do it in a follow-up
  • tests

@psteinroe psteinroe changed the title refactor: parser refactor: lexer Jul 1, 2025
@psteinroe psteinroe requested a review from juleswritescode July 4, 2025 16:00
@psteinroe psteinroe marked this pull request as ready for review July 4, 2025 16:00
@psteinroe psteinroe changed the title refactor: lexer refactor: custom lexer Jul 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant