Skip to content

fix(mysql): explicitly set charset=utf8mb4 to resolve connection failure with mysql-connector-python >= 8.0.30#2656

Open
hirenkumar-n-dholariya wants to merge 2 commits into
sodadata:v3from
hirenkumar-n-dholariya:patch-1
Open

fix(mysql): explicitly set charset=utf8mb4 to resolve connection failure with mysql-connector-python >= 8.0.30#2656
hirenkumar-n-dholariya wants to merge 2 commits into
sodadata:v3from
hirenkumar-n-dholariya:patch-1

Conversation

@hirenkumar-n-dholariya
Copy link
Copy Markdown

Problem

When using soda-core-mysql with mysql-connector-python >= 8.0.30,
connecting to a MySQL data source fails with:

"Character set 'utf8' unsupported"
"Encountered a problem while trying to connect to mysql:
Character set 'utf8' unsupported"

Root Cause

The connect() method in mysql_data_source.py creates a MySQL connection
without specifying a charset parameter:

mysql.connector.connect(
user=self.username, password=self.password,
host=self.host, port=self.port, database=self.database
)

Starting from mysql-connector-python 8.0.30, when no charset is
specified, the connector defaults to 'utf8' and then silently remaps
utf8 → utf8mb4. This remapping fails on MySQL servers older than 5.5.3
which do not support utf8mb4.

Fix

Explicitly pass charset="utf8mb4" and use_unicode=True:

mysql.connector.connect(
user=self.username, password=self.password,
host=self.host, port=self.port, database=self.database,
charset="utf8mb4", use_unicode=True
)

This bypasses the connector's internal remapping logic entirely.

Impact

  • Fixes connection failures for users on mysql-connector-python >= 8.0.30
  • No breaking change — utf8mb4 is fully backwards compatible with utf8
  • No impact on users running older connector versions

References

…odadata#2583)

When using mysql-connector-python >= 8.0.30, the connector internally remaps charset 'utf8' to 'utf8mb4'. Since no charset was explicitly set in the connect() call, this remapping triggered automatically and caused the error on MySQL servers < 5.5.3: "Character set 'utf8' unsupported"

Fix: explicitly pass charset="utf8mb4" and use_unicode=True to the
mysql.connector.connect() call. This bypasses the internal remapping logic entirely and works correctly across all supported MySQL versions.

utf8mb4 is the recommended charset since MySQL 5.5.3 and is fully backwards compatible with utf8 data.

Fixes sodadata#2583
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 10, 2026

CLA assistant check
All committers have signed the CLA.

@sonarqubecloud
Copy link
Copy Markdown

@hirenkumar-n-dholariya
Copy link
Copy Markdown
Author

Hi @bmarinovic @tomassatka @Niels-b

I have submitted this fix for issue #2583 where mysql-connector-python >= 8.0.30 silently remaps charset utf8 → utf8mb4, causing connection failures on MySQL servers older than 5.5.3.

All checks are passing!

Could someone please review when you get a chance? Happy to make any changes based on feedback. Thank you.

@hirenkumar-n-dholariya
Copy link
Copy Markdown
Author

Hi @bmarinovic @tomassatka @Niels-b

Friendly bump on PR #2656. There is also a workflow awaiting approval from a maintainer before CI can fully run.
Could someone please approve the workflow and review when you get a chance? All other checks are passing.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants