Skip to content

tokenize._all_string_prefixes does not list t-string prefixes #134675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ericvsmith opened this issue May 25, 2025 · 5 comments
Closed

tokenize._all_string_prefixes does not list t-string prefixes #134675

ericvsmith opened this issue May 25, 2025 · 5 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@ericvsmith
Copy link
Member

ericvsmith commented May 25, 2025

Bug report

Bug description:

tokenize._all_string_prefixes() does not include t-string prefixes.

$ ./python.bat
Running Release|x64 interpreter...
Python 3.15.0a0 (heads/main-dirty:7b1a7002312, May 25 2025, 22:44:32) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tokenize
>>> tokenize._all_string_prefixes()
{'', 'rf', 'fR', 'rF', 'Br', 'BR', 'f', 'F', 'FR', 'Fr', 'Rf', 'rb', 'rB', 'u', 'U', 'R', 'fr', 'r', 'B', 'RB', 'bR', 'br', 'Rb', 'RF', 'b'}
>>> len(_)
25

This also affects tokenize.endpats and tokenize.StringPrefix.

CPython versions tested on:

3.15

Operating systems tested on:

Windows

Linked PRs

@ericvsmith ericvsmith added the type-bug An unexpected behavior, bug, or error label May 25, 2025
@ericvsmith
Copy link
Member Author

@lysnikolaou @davepeck

@larryhastings
Copy link
Contributor

I wrote a Python program that attempts all possible prefixes; it tries everything with eval and throws them away if there's a SyntaxError. Here's the list I produced:

['', 'b', 'f', 'r', 't', 'u', 'B', 'F', 'R', 'T', 'U', 'br', 'bR', 'fr', 'fR', 'rb', 'rf', 'rt', 'rB', 'rF', 'rT', 'tr', 'tR', 'Br', 'BR', 'Fr', 'FR', 'Rb', 'Rf', 'Rt', 'RB', 'RF', 'RT', 'Tr', 'TR']

@picnixz picnixz added the stdlib Python modules in the Lib dir label May 25, 2025
@terryjreedy
Copy link
Member

@ericvsmith
Copy link
Member Author

I guess the best way to test this is to use @larryhastings 's exhaustive test, and find all 2 (and maybe throw in 3 if it's not too slow) letter combinations, and see that it matches _all_string_prefixes. I don't see a way to test this using a public API: _all_string_prefixes, endpats, and StringPrefix are undocumented. Probably the best thing so use would be to compare the discovered list against set(tokenize.StringPrefix[1:-1].split('|'))

@ericvsmith
Copy link
Member Author

I'm thinking of using this:

import string
import itertools

# Try all lengths until we find a length that has zero valid prefixes.
# This will miss the case where for example there are no valid 3
# character prefixes, but there are valid 4 character prefixes.

valid_prefixes = set()
for length in itertools.count():
    num_at_this_length = 0
    for prefix in (
        "".join(l) for l in list(itertools.combinations(string.ascii_lowercase, length))
    ):
        for t in itertools.permutations(prefix):
            for u in itertools.product(*[(c, c.upper()) for c in t]):
                p = ''.join(u)
                if p == "not":
                    # 'not' can never be a string prefix, because it's a valid
                    # expression: not ""
                    continue
                try:
                    eval(f'{p}""')
                    valid_prefixes.add(p)
                    num_at_this_length += 1
                except SyntaxError:
                    pass
    if num_at_this_length == 0:
        break

print(valid_prefixes)
print(len(valid_prefixes))

Which in 3.9 produces:

{'', 'u', 'rb', 'r', 'f', 'Rf', 'rF', 'fr', 'rB', 'Fr', 'U', 'BR', 'br', 'B', 'bR', 'fR', 'F', 'rf', 'b', 'Br', 'RF', 'FR', 'Rb', 'R', 'RB'}
25

And in 3.15 produces:

{'', 'b', 'bR', 'Br', 'RT', 'rb', 'fr', 'R', 'tR', 'B', 'f', 'TR', 'FR', 'RB', 't', 'rt', 'Fr', 'br', 'r', 'BR', 'rf', 'RF', 'U', 'Rb', 'F', 'Rt', 'rT', 'rB', 'Rf', 'Tr', 'rF', 'tr', 'fR', 'u', 'T'}
35

I'll produce a PR shortly.

ericvsmith added a commit that referenced this issue May 26, 2025
…s doc, and add a test to make sure we catch this error in the future. (#134734)

* Add t-string prefixes to _all_string_prefixes, and add a test to make sure we catch this error in the future.

* Update lexical analysis docs for t-string prefixes.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 26, 2025
…nalysis doc, and add a test to make sure we catch this error in the future. (pythonGH-134734)

* Add t-string prefixes to _all_string_prefixes, and add a test to make sure we catch this error in the future.

* Update lexical analysis docs for t-string prefixes.
(cherry picked from commit 08c78e0)

Co-authored-by: Eric V. Smith <ericvsmith@users.noreply.github.com>
ericvsmith added a commit that referenced this issue May 26, 2025
…analysis doc, and add a test to make sure we catch this error in the future. (GH-134734) (#134739)

gh-134675: Add t-string prefixes to tokenizer module, lexical analysis doc, and add a test to make sure we catch this error in the future. (GH-134734)

* Add t-string prefixes to _all_string_prefixes, and add a test to make sure we catch this error in the future.

* Update lexical analysis docs for t-string prefixes.
(cherry picked from commit 08c78e0)

Co-authored-by: Eric V. Smith <ericvsmith@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants