-
Notifications
You must be signed in to change notification settings - Fork 12
Description
I've noticed some errors when using a terminal "production" rule of the form
T0: T1 | T2 | T3
where all of the given expressions are terminals. These errors only occur in the standalone parser generated by Lark.js; the same grammar will correctly parse an identical string in the python version of lark. I've isolated two hopefully-minimal-enough example cases below.
This seems to be similar to #21 in that it's related to some Javascript-specific regex foible that gets encountered when agglomerating terminals together via |, but as I'm not super-familiar with the internals of the library I can't be sure. As in #21, replacing VALUE with value everywhere (i.e. replacing the terminal rule with a non-terminal one) causes both of the following examples to parse correctly.
Example 1
This grammar:
?start: thing
thing: thing W thing
| expr
expr: label W? VALUE
| VALUE
label: BARE_WORD W? ":"
W: /[ \t\n\v\f]/+
VALUE: NUMBER | BARE_WORD | STRING
BARE_WORD: /[^\s:\(\)]/+
STRING: "\"" /((?:\\"|[^\r\n"]))/* "\""
NUMBER: /[0-9]+/fails with UnexpectedToken when attempting to parse the string "a:b", although running it in the Python version of Lark results in a correct parse.
Example 2
This grammar:
?start: thing
thing: label VALUE | VALUE
label: BARE_WORD W? ":"
W: /[ \t\n\v\f]/+
VALUE: NUMBER | BARE_WORD | STRING
BARE_WORD: /[^\s:\(\)]/+
STRING: "\"" /((?:\\"|[^\r\n"]))/* "\""
NUMBER: /[0-9]+/fails with SyntaxError: Invalid flags supplied to RegExp constructor 'nully' during lexing of the same string "a:b"; the Python version also correctly parses it.