tokenize._all_string_prefixes does not list t-string prefixes #134675

ericvsmith · 2025-05-25T21:48:06Z

Bug report

Bug description:

tokenize._all_string_prefixes() does not include t-string prefixes.

$ ./python.bat
Running Release|x64 interpreter...
Python 3.15.0a0 (heads/main-dirty:7b1a7002312, May 25 2025, 22:44:32) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tokenize
>>> tokenize._all_string_prefixes()
{'', 'rf', 'fR', 'rF', 'Br', 'BR', 'f', 'F', 'FR', 'Fr', 'Rf', 'rb', 'rB', 'u', 'U', 'R', 'fr', 'r', 'B', 'RB', 'bR', 'br', 'Rb', 'RF', 'b'}
>>> len(_)
25

This also affects tokenize.endpats and tokenize.StringPrefix.

CPython versions tested on:

3.15

Operating systems tested on:

Windows

Linked PRs

The text was updated successfully, but these errors were encountered:

ericvsmith · 2025-05-25T21:48:21Z

@lysnikolaou @davepeck

larryhastings · 2025-05-25T21:56:49Z

I wrote a Python program that attempts all possible prefixes; it tries everything with eval and throws them away if there's a SyntaxError. Here's the list I produced:

['', 'b', 'f', 'r', 't', 'u', 'B', 'F', 'R', 'T', 'U', 'br', 'bR', 'fr', 'fR', 'rb', 'rf', 'rt', 'rB', 'rF', 'rT', 'tr', 'tR', 'Br', 'BR', 'Fr', 'FR', 'Rb', 'Rf', 'Rt', 'RB', 'RF', 'RT', 'Tr', 'TR']

terryjreedy · 2025-05-26T11:00:07Z

Also not mentioned in https://docs.python.org/3.15/reference/lexical_analysis.html#string-and-bytes-literals

ericvsmith · 2025-05-26T15:48:31Z

I guess the best way to test this is to use @larryhastings 's exhaustive test, and find all 2 (and maybe throw in 3 if it's not too slow) letter combinations, and see that it matches _all_string_prefixes. I don't see a way to test this using a public API: _all_string_prefixes, endpats, and StringPrefix are undocumented. Probably the best thing so use would be to compare the discovered list against set(tokenize.StringPrefix[1:-1].split('|'))

ericvsmith · 2025-05-26T16:30:09Z

I'm thinking of using this:

import string
import itertools

# Try all lengths until we find a length that has zero valid prefixes.
# This will miss the case where for example there are no valid 3
# character prefixes, but there are valid 4 character prefixes.

valid_prefixes = set()
for length in itertools.count():
    num_at_this_length = 0
    for prefix in (
        "".join(l) for l in list(itertools.combinations(string.ascii_lowercase, length))
    ):
        for t in itertools.permutations(prefix):
            for u in itertools.product(*[(c, c.upper()) for c in t]):
                p = ''.join(u)
                if p == "not":
                    # 'not' can never be a string prefix, because it's a valid
                    # expression: not ""
                    continue
                try:
                    eval(f'{p}""')
                    valid_prefixes.add(p)
                    num_at_this_length += 1
                except SyntaxError:
                    pass
    if num_at_this_length == 0:
        break

print(valid_prefixes)
print(len(valid_prefixes))

Which in 3.9 produces:

{'', 'u', 'rb', 'r', 'f', 'Rf', 'rF', 'fr', 'rB', 'Fr', 'U', 'BR', 'br', 'B', 'bR', 'fR', 'F', 'rf', 'b', 'Br', 'RF', 'FR', 'Rb', 'R', 'RB'}
25

And in 3.15 produces:

{'', 'b', 'bR', 'Br', 'RT', 'rb', 'fr', 'R', 'tR', 'B', 'f', 'TR', 'FR', 'RB', 't', 'rt', 'Fr', 'br', 'r', 'BR', 'rf', 'RF', 'U', 'Rb', 'F', 'Rt', 'rT', 'rB', 'Rf', 'Tr', 'rF', 'tr', 'fR', 'u', 'T'}
35

I'll produce a PR shortly.

…s doc, and add a test to make sure we catch this error in the future. (#134734) * Add t-string prefixes to _all_string_prefixes, and add a test to make sure we catch this error in the future. * Update lexical analysis docs for t-string prefixes.

…nalysis doc, and add a test to make sure we catch this error in the future. (pythonGH-134734) * Add t-string prefixes to _all_string_prefixes, and add a test to make sure we catch this error in the future. * Update lexical analysis docs for t-string prefixes. (cherry picked from commit 08c78e0) Co-authored-by: Eric V. Smith <ericvsmith@users.noreply.github.com>

…analysis doc, and add a test to make sure we catch this error in the future. (GH-134734) (#134739) gh-134675: Add t-string prefixes to tokenizer module, lexical analysis doc, and add a test to make sure we catch this error in the future. (GH-134734) * Add t-string prefixes to _all_string_prefixes, and add a test to make sure we catch this error in the future. * Update lexical analysis docs for t-string prefixes. (cherry picked from commit 08c78e0) Co-authored-by: Eric V. Smith <ericvsmith@users.noreply.github.com>

ericvsmith added the type-bug An unexpected behavior, bug, or error label May 25, 2025

picnixz added the stdlib Python modules in the Lib dir label May 25, 2025

bedevere-app bot mentioned this issue May 26, 2025

gh-134675: Add t-string prefixes to tokenizer module, lexical analysis doc, and add a test to make sure we catch this error in the future. #134734

Merged

bedevere-app bot mentioned this issue May 26, 2025

[3.14] gh-134675: Add t-string prefixes to tokenizer module, lexical analysis doc, and add a test to make sure we catch this error in the future. (GH-134734) #134739

Merged

ericvsmith closed this as completed May 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

tokenize._all_string_prefixes does not list t-string prefixes #134675

tokenize._all_string_prefixes does not list t-string prefixes #134675

ericvsmith commented May 25, 2025 •

edited by bedevere-app bot

Loading

ericvsmith commented May 25, 2025

Uh oh!

larryhastings commented May 25, 2025

Uh oh!

terryjreedy commented May 26, 2025

Uh oh!

ericvsmith commented May 26, 2025

Uh oh!

ericvsmith commented May 26, 2025

Uh oh!

Uh oh!

tokenize._all_string_prefixes does not list t-string prefixes #134675

tokenize._all_string_prefixes does not list t-string prefixes #134675

Comments

ericvsmith commented May 25, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

ericvsmith commented May 25, 2025

Uh oh!

larryhastings commented May 25, 2025

Uh oh!

terryjreedy commented May 26, 2025

Uh oh!

ericvsmith commented May 26, 2025

Uh oh!

ericvsmith commented May 26, 2025

Uh oh!

ericvsmith commented May 25, 2025 •

edited by bedevere-app bot

Loading