Skip to content

Commit 7e4f8f4

Browse files
authored
Make {eol_}comments_re read-only and non-init arguments in ParserConfig (#352)
* [buffering] drop forced multiline match for string patterns Previously, when scanning for matches to a regex, if the type of the pattern was `str`, the pattern was always compiled with `re.MULTILINE`. Recent changes to `ParserConfig` [0] changed the type used for regex matches in generated code from `str` to `re.Pattern` which could lead to a difference in behavior from previous versions where a defined comments or eol_comments may have been implicitly relying on the `re.MULTILINE` flag. After discussion [1], it has been determined that usage of `re` flags within TatSu should be deprecated in favor of users specifying the necessary flags within patterns. As such, drop the `re.MULTILINE` flag for strings compiled on the fly. [0]: #338 [1]: #351 (comment) * [grammar] make eol_comments multiline match Make the default eol_comments regex use multiline matching. Recent changes to `ParserConfig` [0] now use a precompiled regex (an `re.Pattern`) instead of compiling the `str` regex on the fly. The `Tokenizer` previously assumed `str` type regexes should all be `re.MULTILINE` regardless of options defined in the regex itself when compiling the pattern. This behavior has since changed to no longer automatically apply and thus requires configurations to specify the option in the pattern. [0]: #338 * [infos] make {eol_}comments_re read-only attributes Previously, the `eol_comments_re` and `comments_re` attributes were public init arguments, were modifiable, and could thus become out of sync with the `eol_comments` and `comments` attributes. Also, with recent changes to `ParserConfig` [0], there were two ways to initialize the regex values for comments and eol_comments directives; either via the constructor using the *_re variables or by using the sister string arguments and relying on `__post_init__` to compile the values which trumped the explicit *_re argument values. Now, the constructor interface has been simplified to not take either `eol_comments_re` or `comments_re` as arguments. Callers may only use `eol_comments` and `comments`. The `eol_comments_re` and `comments_re` attributes are still public, but are read-only so they are always a reflection of their sister string values passed into the constructor. [0]: #200 * [codegen] migrate to {eol_}comments * [ngcodegen] migrate to {eol_}comments * [bootstrap] migrate to {eol_}comments * [lint] resolve errors * [docs] note {eol_}comments directive behavior changes * [docs] update syntax to reflect {eol_}comments arguments * [test] fix test_parse_hash to use eol_comments * [test] explicitly use multiline match in test_patterns_with_newlines
1 parent 1b632e8 commit 7e4f8f4

File tree

16 files changed

+63
-40
lines changed

16 files changed

+63
-40
lines changed

docs/directives.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ Specifies a regular expression to identify and exclude inline (bracketed) commen
2929
3030
@@comments :: /\(\*((?:.|\n)*?)\*\)/
3131
32+
.. note::
33+
Prior to 5.12.1, comments implicitly had the `(?m) <https://docs.python.org/3/library/re.html#re.MULTILINE>`_ option defined. This is no longer the case.
3234

3335
``@@eol_comments :: <regexp>``
3436
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -39,6 +41,8 @@ Specifies a regular expression to identify and exclude end-of-line comments befo
3941
4042
@@eol_comments :: /#([^\n]*?)$/
4143
44+
.. note::
45+
Prior to 5.12.1, eol_comments implicitly had the `(?m) <https://docs.python.org/3/library/re.html#re.MULTILINE>`_ option defined. This is no longer the case.
4246

4347
``@@ignorecase :: <bool>``
4448
~~~~~~~~~~~~~~~~~~~~~~~~~~

docs/syntax.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -735,11 +735,11 @@ Comments
735735
~~~~~~~~
736736

737737
Parsers will skip over comments specified as a regular expression using
738-
the ``comments_re`` parameter:
738+
the ``comments`` parameter:
739739

740740
.. code:: python
741741
742-
parser = MyParser(text, comments_re="\(\*.*?\*\)")
742+
parser = MyParser(text, comments="\(\*.*?\*\)")
743743
744744
For more complex comment handling, you can override the
745745
``Buffer.eat_comments()`` method.
@@ -751,8 +751,8 @@ comments separately:
751751
752752
parser = MyParser(
753753
text,
754-
comments_re="\(\*.*?\*\)",
755-
eol_comments_re="#.*?$"
754+
comments="\(\*.*?\*\)",
755+
eol_comments="#.*?$"
756756
)
757757
758758
Both patterns may also be specified within a grammar using the

grammar/tatsu.ebnf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
@@grammar :: TatSu
22
@@whitespace :: /\s+/
33
@@comments :: ?"(?sm)[(][*](?:.|\n)*?[*][)]"
4-
@@eol_comments :: ?"#[^\n]*$"
4+
@@eol_comments :: ?"(?m)#[^\n]*$"
55
@@parseinfo :: True
66
@@left_recursion :: False
77

tatsu/bootstrap.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,8 @@ def __init__(self, text, /, config: ParserConfig | None = None, **settings):
3535
ignorecase=False,
3636
namechars='',
3737
parseinfo=True,
38-
comments_re='(?sm)[(][*](?:.|\\n)*?[*][)]',
39-
eol_comments_re='#[^\\n]*$',
38+
comments='(?sm)[(][*](?:.|\\n)*?[*][)]',
39+
eol_comments='(?m)#[^\\n]*$',
4040
keywords=KEYWORDS,
4141
start='start',
4242
)
@@ -55,8 +55,8 @@ def __init__(self, /, config: ParserConfig | None = None, **settings):
5555
ignorecase=False,
5656
namechars='',
5757
parseinfo=True,
58-
comments_re='(?sm)[(][*](?:.|\\n)*?[*][)]',
59-
eol_comments_re='#[^\\n]*$',
58+
comments='(?sm)[(][*](?:.|\\n)*?[*][)]',
59+
eol_comments='(?m)#[^\\n]*$',
6060
keywords=KEYWORDS,
6161
start='start',
6262
)

tatsu/buffering.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -357,7 +357,7 @@ def _scanre(self, pattern):
357357
if isinstance(pattern, RETYPE):
358358
cre = pattern
359359
else:
360-
cre = re.compile(pattern, re.MULTILINE)
360+
cre = re.compile(pattern)
361361
return cre.match(self.text, self.pos)
362362

363363
@property

tatsu/codegen/objectmodel.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -67,11 +67,11 @@ def _get_full_name(cls):
6767
# Try to reference the class
6868
try:
6969
idents = name.split('.')
70-
_cls = getattr(module, idents[0])
70+
cls_ = getattr(module, idents[0])
7171
for ident in idents[1:]:
72-
_cls = getattr(_cls, ident)
72+
cls_ = getattr(cls_, ident)
7373

74-
assert _cls == cls
74+
assert cls_ == cls
7575
except AttributeError as e:
7676
raise CodegenError(
7777
"Couldn't find base type, it has to be importable",

tatsu/codegen/python.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -462,8 +462,8 @@ def render_fields(self, fields):
462462
left_recursion = self.node.config.left_recursion
463463
parseinfo = self.node.config.parseinfo
464464
namechars = repr(self.node.config.namechars or '')
465-
comments_re = repr(self.node.config.comments_re)
466-
eol_comments_re = repr(self.node.config.eol_comments_re)
465+
comments = repr(self.node.config.comments)
466+
eol_comments = repr(self.node.config.eol_comments)
467467

468468
rules = '\n'.join(
469469
[self.get_renderer(rule).render() for rule in self.node.rules],
@@ -488,8 +488,8 @@ def render_fields(self, fields):
488488
parseinfo=parseinfo,
489489
keywords=keywords,
490490
namechars=namechars,
491-
comments_re=comments_re,
492-
eol_comments_re=eol_comments_re,
491+
comments=comments,
492+
eol_comments=eol_comments,
493493
)
494494

495495
abstract_rule_template = """
@@ -535,8 +535,8 @@ def __init__(self, text, /, config: ParserConfig | None = None, **settings):
535535
ignorecase={ignorecase},
536536
namechars={namechars},
537537
parseinfo={parseinfo},
538-
comments_re={comments_re},
539-
eol_comments_re={eol_comments_re},
538+
comments={comments},
539+
eol_comments={eol_comments},
540540
keywords=KEYWORDS,
541541
start={start!r},
542542
)
@@ -554,8 +554,8 @@ def __init__(self, /, config: ParserConfig | None = None, **settings):
554554
ignorecase={ignorecase},
555555
namechars={namechars},
556556
parseinfo={parseinfo},
557-
comments_re={comments_re},
558-
eol_comments_re={eol_comments_re},
557+
comments={comments},
558+
eol_comments={eol_comments},
559559
left_recursion={left_recursion},
560560
keywords=KEYWORDS,
561561
start={start!r},

tatsu/g2e/semantics.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99

1010
def camel2py(name):
1111
return re.sub(
12-
'([a-z0-9])([A-Z])',
12+
r'([a-z0-9])([A-Z])',
1313
lambda m: m.group(1) + '_' + m.group(2).lower(),
1414
name,
1515
)

tatsu/grammars.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -519,7 +519,7 @@ def _to_str(self, lean=False):
519519

520520
if multi:
521521
return '\n|\n'.join(indent(o) for o in options)
522-
elif len(options) and len(single) > PEP8_LLEN:
522+
elif options and len(single) > PEP8_LLEN:
523523
return '| ' + '\n| '.join(o for o in options)
524524
else:
525525
return single

tatsu/infos.py

Lines changed: 27 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
import copy
44
import dataclasses
55
import re
6-
from collections.abc import Callable, Mapping
6+
from collections.abc import Callable, MutableMapping
77
from itertools import starmap
88
from typing import Any, NamedTuple
99

@@ -30,8 +30,8 @@ class ParserConfig:
3030
start_rule: str | None = None # FIXME
3131
rule_name: str | None = None # Backward compatibility
3232

33-
comments_re: re.Pattern | None = None
34-
eol_comments_re: re.Pattern | None = None
33+
_comments_re: re.Pattern | None = dataclasses.field(default=None, init=False, repr=False)
34+
_eol_comments_re: re.Pattern | None = dataclasses.field(default=None, init=False, repr=False)
3535

3636
tokenizercls: type[Tokenizer] | None = None # FIXME
3737
semantics: type | None = None
@@ -64,9 +64,17 @@ def __post_init__(self): # pylint: disable=W0235
6464
if self.ignorecase:
6565
self.keywords = [k.upper() for k in self.keywords]
6666
if self.comments:
67-
self.comments_re = re.compile(self.comments)
67+
self._comments_re = re.compile(self.comments)
6868
if self.eol_comments:
69-
self.eol_comments_re = re.compile(self.eol_comments)
69+
self._eol_comments_re = re.compile(self.eol_comments)
70+
71+
@property
72+
def comments_re(self) -> re.Pattern | None:
73+
return self._comments_re
74+
75+
@property
76+
def eol_comments_re(self) -> re.Pattern | None:
77+
return self._eol_comments_re
7078

7179
@classmethod
7280
def new(
@@ -84,7 +92,7 @@ def effective_rule_name(self):
8492
# note: there are legacy reasons for this mess
8593
return self.start_rule or self.rule_name or self.start
8694

87-
def _find_common(self, **settings: Any) -> Mapping[str, Any]:
95+
def _find_common(self, **settings: Any) -> MutableMapping[str, Any]:
8896
return {
8997
name: value
9098
for name, value in settings.items()
@@ -101,8 +109,20 @@ def replace_config(
101109
else:
102110
return self.replace(**vars(other))
103111

112+
# non-init fields cannot be used as arguments in `replace`, however
113+
# they are values returned by `vars` and `dataclass.asdict` so they
114+
# must be filtered out.
115+
# If the `ParserConfig` dataclass drops these fields, then this filter can be removed
116+
def _filter_non_init_fields(self, settings: MutableMapping[str, Any]) -> MutableMapping[str, Any]:
117+
for field in [
118+
field.name for field in dataclasses.fields(self) if not field.init
119+
]:
120+
if field in settings:
121+
del settings[field]
122+
return settings
123+
104124
def replace(self, **settings: Any) -> ParserConfig:
105-
overrides = self._find_common(**settings)
125+
overrides = self._filter_non_init_fields(self._find_common(**settings))
106126
result = dataclasses.replace(self, **overrides)
107127
if 'grammar' in overrides:
108128
result.name = result.grammar

0 commit comments

Comments
 (0)