Skip to content

Conversation

@tetektoza
Copy link
Contributor

When users write identifiers starting with underscore, (e.g.: let _c: u32), the parser produces an uninformative error.

Root cause of that is that lexer splits _c into two separate tokens, and the parser had no grammar rule matching the pattern, so LARLPOP generates a generir error listing what tokens it expected at that position (which happened to be '(' for tuple destructuring).

So, this adds new lexer token - InvalidLeadingUnderscoreIdent that catches any leading underscore next to identifiers.

Before:

Error [EPAR0370005]: expected an identifier, '(' -- found '_'
    --> \\?\C:\Users\tetektoza\source\repos\leo\test_underscore_project\test_underscore\src\main.leo:3:13
     |
   3 |         let _c: u32 = a + b;
     |             ^

After:

Error [EPAR0370047]: Identifier cannot start with an underscore
    --> \\?\C:\Users\tetektoza\source\repos\leo\test_underscore_project\test_underscore\src\main.leo:3:13
     |
   3 |         let _c: u32 = a + b;
     |             ^^
     |
     = Identifiers must start with a letter.

Resolves: #28755

@tetektoza
Copy link
Contributor Author

tetektoza commented Oct 5, 2025

Argh, I didn't see it before (and I should've checked first), but now after pasting before and after to the PR message, it looks like the error message has been improved on mainnet. 😅 Maybe the current one (which is different compared to the one from #28755) is enough and this PR could be closed and issue too, since it seems like there was some improvement in this error message. Not sure, but pasting my solution anyways.

@mohammadfawaz
Copy link
Collaborator

In https://github.com/ProvableHQ/leo/blob/mainnet/compiler/parser-lossless/src/lib.rs, we already have a function that checks identifiers. Is it better to do this check there? I'm not sure how I feel about inserting invalid regrexes in the tokenizer 🤔

Also, when adding new errors, we have this quirk that new errors should always be appended at the end of the errors file, so that old error codes don't change.

@tetektoza
Copy link
Contributor Author

Point 1 - hmm, you're talking about check_identifier right? I took a look on it, but _c becomes two tokens at this point, Underscore and Identifier, so the function runs only on Identifier token. I can extend it, but it seemed counter-intuitive at this point, it's also possible to modify grammar, but probably with a ton of changes.

Point 2 - ahhh, good point, thanks. I can adjust it.

Let me know about point 1 and my first comment in the PR, maybe we don't need it anymore. Not sure.

@mohammadfawaz
Copy link
Collaborator

Point 1 - hmm, you're talking about check_identifier right? I took a look on it, but _c becomes two tokens at this point, Underscore and Identifier, so the function runs only on Identifier token. I can extend it, but it seemed counter-intuitive at this point, it's also possible to modify grammar, but probably with a ton of changes.

Oh I now undersatnd. Yeah then doing it in in the tokenizer is probably the way to go. Thanks!

@formatted
identifier_cannot_start_with_underscore {
args: (),
msg: "Identifier cannot start with an underscore",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msg: "Identifier cannot start with an underscore",
msg: "Identifiers cannot start with an underscore.",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

When users write identifiers starting with underscore,
(e.g.: let _c: u32), the parser produces an uninformative error.

Root cause of that is that lexer splits `_c` into two separate tokens,
and the parser had no grammar rule matching the pattern, so LARLPOP
generates a generir error listing what tokens it expected at that
position (which happened to be '(' for tuple destructuring).

So, this adds new lexer token - `InvalidLeadingUnderscoreIdent` that
catches any leading underscore next to identifiers.
@tetektoza tetektoza force-pushed the fix/28755_add_rules_to_grammar_for_underscore branch from 88a72cb to 019ccf1 Compare October 7, 2025 00:39
@tetektoza
Copy link
Contributor Author

Also moved the error at the end of the file as discussed, removed a lot of changes with newly updated error codes.

Copy link
Collaborator

@mohammadfawaz mohammadfawaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mohammadfawaz
Copy link
Collaborator

Closing in favour of #28909

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] The parser error provided when an identifier has a leading underscore is very uninformative.

2 participants