Skip to content

Various errors when using | inside of terminals #31

@swwu

Description

@swwu

I've noticed some errors when using a terminal "production" rule of the form

T0: T1 | T2 | T3

where all of the given expressions are terminals. These errors only occur in the standalone parser generated by Lark.js; the same grammar will correctly parse an identical string in the python version of lark. I've isolated two hopefully-minimal-enough example cases below.

This seems to be similar to #21 in that it's related to some Javascript-specific regex foible that gets encountered when agglomerating terminals together via |, but as I'm not super-familiar with the internals of the library I can't be sure. As in #21, replacing VALUE with value everywhere (i.e. replacing the terminal rule with a non-terminal one) causes both of the following examples to parse correctly.

Example 1

This grammar:

?start: thing
thing: thing W thing
    | expr
expr: label W? VALUE
    | VALUE
label: BARE_WORD W? ":"
W: /[ \t\n\v\f]/+
VALUE: NUMBER | BARE_WORD | STRING
BARE_WORD: /[^\s:\(\)]/+
STRING: "\"" /((?:\\"|[^\r\n"]))/* "\""
NUMBER: /[0-9]+/

fails with UnexpectedToken when attempting to parse the string "a:b", although running it in the Python version of Lark results in a correct parse.

Example 2

This grammar:

?start: thing
thing: label VALUE | VALUE
label: BARE_WORD W? ":"
W: /[ \t\n\v\f]/+
VALUE: NUMBER | BARE_WORD | STRING
BARE_WORD: /[^\s:\(\)]/+
STRING: "\"" /((?:\\"|[^\r\n"]))/* "\""
NUMBER: /[0-9]+/

fails with SyntaxError: Invalid flags supplied to RegExp constructor 'nully' during lexing of the same string "a:b"; the Python version also correctly parses it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions