Skip to content

Conversation

Subash-Mohan
Copy link
Contributor

@Subash-Mohan Subash-Mohan commented Aug 1, 2025

Description

This pull request updates the sanitize_s3_key_name function in backend/onyx/file_store/s3_key_utils.py to improve the sanitization process for S3 key names. The changes aim to handle special characters more effectively and ensure consistent formatting of sanitized file names.

Improvements to sanitization logic:

  • Replaced forward slashes (/) in file names with hyphens (-) to avoid issues with S3 key paths. ([backend/onyx/file_store/s3_key_utils.pyL56-R56](https://github.com/onyx-dot-app/onyx/pull/5145/files#diff-88c10e8801ae081482cd784617f79a3e9dae6648f297c5fc65aeee74a0a5957bL56-R56))
  • Added logic to collapse multiple consecutive separators (- or _) into a single hyphen (-) for cleaner and more consistent key names. ([backend/onyx/file_store/s3_key_utils.pyR84-R86](https://github.com/onyx-dot-app/onyx/pull/5145/files#diff-88c10e8801ae081482cd784617f79a3e9dae6648f297c5fc65aeee74a0a5957bR84-R86))

Linear link -> https://linear.app/danswer/issue/DAN-2252/invalid-char-in-minio-put

How Has This Been Tested?

I wrote a script to insert files with problematic names, successfully reproduced the issues, and verified the changes using the script.

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

Summary by cubic

Improved S3 key name sanitization to handle unsupported characters in MinIO migrations. File names with slashes are now replaced with hyphens, and repeated separators are collapsed for cleaner keys.

  • Bug Fixes
    • Replaces forward slashes with hyphens in file names.
    • Collapses multiple consecutive hyphens or underscores into a single hyphen.

@Subash-Mohan Subash-Mohan requested a review from a team as a code owner August 1, 2025 10:22
Copy link

vercel bot commented Aug 1, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 1, 2025 10:33am

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR fixes MinIO/S3 compatibility issues by enhancing the sanitize_s3_key_name function in backend/onyx/file_store/s3_key_utils.py. The changes address two specific problems with S3 key generation:

  1. Forward slash handling: The function now replaces forward slashes (/) with hyphens (-) to prevent S3 from interpreting them as path separators, which could create unintended directory structures or cause upload/download failures.

  2. Separator cleanup: A new regex pattern [-_]{2,} collapses multiple consecutive separators (dashes or underscores) into a single hyphen, ensuring cleaner and more consistent key formatting.

This enhancement fits into the broader file storage system that handles document uploads and management. The s3_key_utils.py module is responsible for generating safe S3 object keys from potentially problematic file names, which is critical for the document indexing pipeline and file storage operations throughout the application.

Confidence score: 4/5

  • This is a safe, targeted fix that addresses specific character handling issues without changing core functionality
  • The changes are minimal and follow existing code patterns, reducing risk of introducing bugs
  • The file backend/onyx/file_store/s3_key_utils.py should be reviewed to ensure the regex pattern and character replacement logic work correctly with edge cases

1 file reviewed, 1 comment

Edit Code Review Bot Settings | Greptile

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

No issues found across 1 file. Review in cubic

@Weves Weves enabled auto-merge August 4, 2025 19:38
@Weves Weves disabled auto-merge August 4, 2025 19:38
@Weves Weves merged commit 146628e into main Aug 4, 2025
13 of 15 checks passed
@Weves Weves deleted the fix-minio-unsupported-characters branch August 4, 2025 19:42
wenxi-onyx pushed a commit that referenced this pull request Aug 11, 2025
* fix unsupported character error in minio migration

* slash fix
AnkitTukatek pushed a commit to TukaTek/onyx that referenced this pull request Sep 23, 2025
* fix unsupported character error in minio migration

* slash fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants