json to g4 only with "parser" cause some syntax error

In my experimental environment, I found json to g4 only with "parser" cause some syntax error, syntax parsing errors may lead to the possibility of losing a large amount of mutated data.

I made mincase `lex.json`:
```
{
    "<A>": [["<NUMBER>", "<STRING>", "\n"]],
    "<NUMBER>": [["10"], ["99"]],
    "<STRING>": [["(", "<HEXSTRING>", ")"]],
    "<HEXSTRING>": [["<CHAR>", "<HEXSTRING>"], []],
    "<CHAR>": [
            ["0"], ["1"], ["2"], ["3"], ["4"], ["5"], ["6"], ["7"],
            ["8"], ["9"], ["a"], ["b"], ["c"], ["d"], ["e"], ["f"]
    ]
}
```

Grammar-Mutator `make` it, generate `Grammar.g4` is:
```
grammar Grammar;
entry
    : node_A EOF
    ;
node_A
    : node_NUMBER node_STRING '\n'
    ;
node_NUMBER
    : '10'
    | '99'
    ;
node_STRING
    : '(' node_HEXSTRING ')'
    ;
node_HEXSTRING
    : 
    | node_CHAR node_HEXSTRING
    ;
node_CHAR
    : '0'
    | '1'
    | '2'
    | '3'
    | '4'
    | '5'
    | '6'
    | '7'
    | '8'
    | '9'
    | 'a'
    | 'b'
    | 'c'
    | 'd'
    | 'e'
    | 'f'
    ;
```

we prepared input data `seed1 / seed2`, and use `antlr4-parse` to testing:

![Screen Shot 2024-01-18 at 17 03 03](https://github.com/AFLplusplus/Grammar-Mutator/assets/21287921/e8e442b8-6769-4f59-8af9-c52be4b54f52)

why is `10(10)` parsed incorrectly? because antlr4 is divided into two stages: lexer and parser. during lexer stage, `node_NUMBER:10` will be recognized as TOKEN, and in the parser stage, the result is `node_NUMBER (node_NUMBER)`, so an error occurred.

in the antlr4 grammar, lex rules begin with an uppercase letter, parser rules begin with a lowercase letter, so we should tell antlr4 the lexical rules clearly, patch `Grammar_patch.g4`:
```
grammar Grammar_patch;
entry
    : node_A EOF
    ;
node_A
    : node_NUMBER Node_STRING '\n'
    ;
node_NUMBER
    : '10'
    | '99'
    ;
Node_STRING
    : '(' Node_HEXSTRING ')'
    ;
Node_HEXSTRING
    : 
    | Node_CHAR Node_HEXSTRING
    ;
Node_CHAR
    : '0'
    | '1'
    | '2'
    | '3'
    | '4'
    | '5'
    | '6'
    | '7'
    | '8'
    | '9'
    | 'a'
    | 'b'
    | 'c'
    | 'd'
    | 'e'
    | 'f'
    ;
```

testing again:

![Screen Shot 2024-01-18 at 17 18 58](https://github.com/AFLplusplus/Grammar-Mutator/assets/21287921/6a167205-c337-40d6-a9e5-49693dcf608a)

>the "warning" prompts us it can match the empty string, this may cause antlr4 parsing backtrace issues, but we can easily mark it with `fragment Node_HEXSTRING`

maybe we can optimize the json to g4 generation code, to distinguish between lexer and parser?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

json to g4 only with "parser" cause some syntax error #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

json to g4 only with "parser" cause some syntax error #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions