-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix unsupported character error in minio migration #5145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR fixes MinIO/S3 compatibility issues by enhancing the sanitize_s3_key_name
function in backend/onyx/file_store/s3_key_utils.py
. The changes address two specific problems with S3 key generation:
-
Forward slash handling: The function now replaces forward slashes (
/
) with hyphens (-
) to prevent S3 from interpreting them as path separators, which could create unintended directory structures or cause upload/download failures. -
Separator cleanup: A new regex pattern
[-_]{2,}
collapses multiple consecutive separators (dashes or underscores) into a single hyphen, ensuring cleaner and more consistent key formatting.
This enhancement fits into the broader file storage system that handles document uploads and management. The s3_key_utils.py
module is responsible for generating safe S3 object keys from potentially problematic file names, which is critical for the document indexing pipeline and file storage operations throughout the application.
Confidence score: 4/5
- This is a safe, targeted fix that addresses specific character handling issues without changing core functionality
- The changes are minimal and follow existing code patterns, reducing risk of introducing bugs
- The file
backend/onyx/file_store/s3_key_utils.py
should be reviewed to ensure the regex pattern and character replacement logic work correctly with edge cases
1 file reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cubic analysis
No issues found across 1 file. Review in cubic
* fix unsupported character error in minio migration * slash fix
* fix unsupported character error in minio migration * slash fix
Description
This pull request updates the
sanitize_s3_key_name
function inbackend/onyx/file_store/s3_key_utils.py
to improve the sanitization process for S3 key names. The changes aim to handle special characters more effectively and ensure consistent formatting of sanitized file names.Improvements to sanitization logic:
/
) in file names with hyphens (-
) to avoid issues with S3 key paths. ([backend/onyx/file_store/s3_key_utils.pyL56-R56](https://github.com/onyx-dot-app/onyx/pull/5145/files#diff-88c10e8801ae081482cd784617f79a3e9dae6648f297c5fc65aeee74a0a5957bL56-R56)
)-
or_
) into a single hyphen (-
) for cleaner and more consistent key names. ([backend/onyx/file_store/s3_key_utils.pyR84-R86](https://github.com/onyx-dot-app/onyx/pull/5145/files#diff-88c10e8801ae081482cd784617f79a3e9dae6648f297c5fc65aeee74a0a5957bR84-R86)
)Linear link -> https://linear.app/danswer/issue/DAN-2252/invalid-char-in-minio-put
How Has This Been Tested?
I wrote a script to insert files with problematic names, successfully reproduced the issues, and verified the changes using the script.
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.
Summary by cubic
Improved S3 key name sanitization to handle unsupported characters in MinIO migrations. File names with slashes are now replaced with hyphens, and repeated separators are collapsed for cleaner keys.