Skip to content

Fix a crash when a pylint must display unicode raising a UnicodeEncodeError #9732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 10, 2025

Conversation

Pierre-Sassoulas
Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas commented Jun 14, 2024

Type of Changes

Type
βœ“ πŸ› Bug fix

Description

Fixes for #8736, this is probably not the right fix, but it's a fix. Hoping for a surrogates/unicode expert to chime in with the right approach πŸ˜„ ! after becoming a surrogate expert myself. This prevent the crash in the general case (for all checkers even badly coded plugin) and make the message output for comparison of constant closer to the content of the code by using the AST instead of recalculating. We can backport half of this. Review is better commit by commit as there is two parts.

@Pierre-Sassoulas Pierre-Sassoulas added Work in progress Crash πŸ’₯ A bug that makes pylint crash labels Jun 14, 2024
@Pierre-Sassoulas
Copy link
Member Author

We should probably aim to keep the \ud800\udc00 format instead of transforming to actual utf32 characters.

@DanielNoord
Copy link
Collaborator

https://stackoverflow.com/questions/27366479/python-3-os-walk-file-paths-unicodeencodeerror-utf-8-codec-cant-encode-s

encode('utf8','replace') seems to work? Although I really don't like that we need to add special logic for a really uncommon case..

@@ -0,0 +1 @@
comparison-of-constants:3:7:3:37::"Comparison between constants: '𐀀 == ' has a constant value":HIGH
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
comparison-of-constants:3:7:3:37::"Comparison between constants: '𐀀 == ' has a constant value":HIGH
comparison-of-constants:3:7:3:37::"Comparison between constants: '"\U00010000" == "\ud800\udc00"' has a constant value":HIGH

Imo we should aim for this, but I have an intuition that the ast module internal might be involved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I don't think we can. self.out is utf-8 encoded so we always need to encode/escape to utf-8.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to be fixed in comparison-of-constant I think 6584761

This comment has been minimized.

@DanielNoord DanielNoord added the Needs take over πŸ›ŽοΈ Orignal implementer went away but the code could be salvaged. label Mar 30, 2025

This comment has been minimized.

@Pierre-Sassoulas Pierre-Sassoulas added Needs review πŸ” Needs to be reviewed by one or multiple more persons and removed Needs take over πŸ›ŽοΈ Orignal implementer went away but the code could be salvaged. Work in progress labels Aug 10, 2025
@Pierre-Sassoulas Pierre-Sassoulas added this to the 4.0.0 milestone Aug 10, 2025
comparison-of-constants:3:6:3:12::"Comparison between constants: '2 == 2' has a constant value":HIGH
comparison-of-constants:6:6:6:11::"Comparison between constants: '2 > 2' has a constant value":HIGH
comparison-of-constants:16:3:16:15::"Comparison between constants: 'True == True' has a constant value":HIGH
comparison-of-constants:3:6:3:12::"Comparison between constants: ""2 == 2"" has a constant value":HIGH
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears as double "" but it's due to the "csv" encoding (the string start with ") and it's actually fine.

Comment on lines +1 to +3
"""This does not crash in the functional tests, but it did when called directly."""

assert "\U00010000" == "\ud800\udc00" # [comparison-of-constants]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't manage to reproduce the crash automatically. Might be due to some unicode's magic already present in the functional test.

Copy link
Contributor

πŸ€– According to the primer, this change has no effect on the checked open source code. πŸ€–πŸŽ‰

This comment was generated for commit 6584761

@Pierre-Sassoulas Pierre-Sassoulas merged commit 0d056ff into pylint-dev:main Aug 10, 2025
46 checks passed
@Pierre-Sassoulas Pierre-Sassoulas deleted the fix-unicode-crash branch August 10, 2025 19:08
@Pierre-Sassoulas Pierre-Sassoulas removed the Needs review πŸ” Needs to be reviewed by one or multiple more persons label Aug 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Crash πŸ’₯ A bug that makes pylint crash
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants