fix(mysql): explicitly set charset=utf8mb4 to resolve connection failure with mysql-connector-python >= 8.0.30#2656
Open
hirenkumar-n-dholariya wants to merge 2 commits into
Open
Conversation
…odadata#2583) When using mysql-connector-python >= 8.0.30, the connector internally remaps charset 'utf8' to 'utf8mb4'. Since no charset was explicitly set in the connect() call, this remapping triggered automatically and caused the error on MySQL servers < 5.5.3: "Character set 'utf8' unsupported" Fix: explicitly pass charset="utf8mb4" and use_unicode=True to the mysql.connector.connect() call. This bypasses the internal remapping logic entirely and works correctly across all supported MySQL versions. utf8mb4 is the recommended charset since MySQL 5.5.3 and is fully backwards compatible with utf8 data. Fixes sodadata#2583
for more information, see https://pre-commit.ci
|
Author
|
Hi @bmarinovic @tomassatka @Niels-b I have submitted this fix for issue #2583 where mysql-connector-python >= 8.0.30 silently remaps charset utf8 → utf8mb4, causing connection failures on MySQL servers older than 5.5.3. All checks are passing! Could someone please review when you get a chance? Happy to make any changes based on feedback. Thank you. |
Author
|
Hi @bmarinovic @tomassatka @Niels-b Friendly bump on PR #2656. There is also a workflow awaiting approval from a maintainer before CI can fully run. Thank you! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Problem
When using soda-core-mysql with mysql-connector-python >= 8.0.30,
connecting to a MySQL data source fails with:
"Character set 'utf8' unsupported"
"Encountered a problem while trying to connect to mysql:
Character set 'utf8' unsupported"
Root Cause
The connect() method in mysql_data_source.py creates a MySQL connection
without specifying a charset parameter:
mysql.connector.connect(
user=self.username, password=self.password,
host=self.host, port=self.port, database=self.database
)
Starting from mysql-connector-python 8.0.30, when no charset is
specified, the connector defaults to 'utf8' and then silently remaps
utf8 → utf8mb4. This remapping fails on MySQL servers older than 5.5.3
which do not support utf8mb4.
Fix
Explicitly pass charset="utf8mb4" and use_unicode=True:
mysql.connector.connect(
user=self.username, password=self.password,
host=self.host, port=self.port, database=self.database,
charset="utf8mb4", use_unicode=True
)
This bypasses the connector's internal remapping logic entirely.
Impact
References