Replies: 1 comment 8 replies
-
Thank you so much for looking at this! So my main motivation around reusing the lexer was to avoid too much code duplication (especially around the more awkward bits of Lua's syntax, like long strings and
Oh, just to clarify here, the lexer is only used for error-reporting (#1298, etc...), and not by Lua's compiler (that's written in Java instead). So while no bugs are desirable, the fallout here is much smaller!
I think that sounds sensible. A couple of thoughts:
Then I think the logic roughly follows what you had already:
I don't know if that sounds sensible? You probably could fit this into the existing lexer, by returning the continuation as tuple of a function + arguments to that function. So for instance, Footnotes
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
When reviewing this PR, @SquidDev suggested:
I am interested in updating
edit.lua
's syntax highlighter, but I'm not convinced that reusing the existing lexer is the best way. My thoughts are below.Places where the current lexer isn't applicable to
edit.lua
Firstly, I'm open to other opinions, but my gut thought is that we should only reuse the lexer if
lexer.lua
itself will require no modification. The reason for this is because bugs in the lexer (and therefore in the compiler) are much worse than bugs in the syntax highlighter. Any bugs in the syntax highlighter have a purely cosmetic impact (highlighting something the wrong colour) whereas (in the worst case) bugs in the lexer could result in the user's code getting parsed incorrectly! To avoid the risk of introducing obscure bugs into the lexer while trying to make it more reusable, I think we ought to leave the lexer untouched.Unfortunately, I think there a fair few places where we would need to modify the lexer's code to make it work for syntax highlighting. A fairly simple example would be that we want a different behaviour for unterminated multiline strings, between the compiler vs
edit.lua
- the former gives anERROR
token, whereas we'd like the syntax highlighter to still display this as a string!Another example would be the way that positions are reported.
lex.lua
stores its position as a single integer, rather than storing (line number, position within line). This is slightly inconvenient for use inedit.lua
, because the display is drawn line-by-line.Why I'm not convinced reusing the lexer would simplify
edit.lua
Also, I'm not sure we'd actually reduce the amount of code needed in
edit.lua
that much, by using the lexer.While the existing lexer can be started from anywhere in the document, this doesn't include starting the lexer mid-token. Eg: if I have a multiline string, and the user changes a character in the middle of this string, we'd have to pass the whole multiline string to the lexer, since there's no way of passing the context of "we're in the middle of a string" to the lexer. "Rewinding" to the beginning of the beginning of the previous token before calling the lexer would fix this issue, but we'd then need to modify the internal state stored by
edit.lua
to contain some mapping from position to token, and make sure this is updated appropriately.We'd also need logic in
edit.lua
to figure out how many times to calllex_one
each time a character is changed. For instance, if the user removes a "start multiline comment" token which previously started a long comment, then all of the code that was previously commented out needs to be re-lexed, even if this is many tokens! On the flip side, we don't want to re-lex the entire "rest of the file" every time the user makes a small change at the top of the file.What still needs to be changed, and what's my solution?
Regardless, I still think the syntax highlighter in
edit.py
should be rewritten. The reason for this is because it currently only highlights one line at a time, which means it can't properly highlight multiline comments or multiline strings.My proposed solution is as follows:
edit.lua
should store a list of triples (one for each line), containing:When the user updates a line (entering or deleting a character) then always re-lex the entire line (this is something we already do), with additional context of whether we're in a string or comment. Additionally:
Beta Was this translation helpful? Give feedback.
All reactions