-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Bugfix/drive doc ids #4990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/drive doc ids #4990
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
Fixes duplicate document issues in Google Drive integration by normalizing document IDs across the system, with supporting database migrations and UI improvements.
- New migration
backend/alembic/versions/12635f6655b7_drive_canonical_ids.py
implements document ID normalization by removing URL parameters and standardizing paths - Modified
backend/onyx/connectors/google_drive/doc_conversion.py
to clean up Google Drive URLs by removing '/edit', '/view', and query parameters - Enhanced
web/src/app/admin/connector/[ccPairId]/IndexAttemptErrorsModal.tsx
with improved scrollable error messages and consistent table layouts - Added defensive 'DROP TABLE IF EXISTS CASCADE' statements in multiple migration files for safer schema updates
7 files reviewed, 1 comment
Edit PR Review Bot Settings | Greptile
print(f"Failed to update document {current_doc_id}: {e}") | ||
from httpx import HTTPStatusError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: HTTPStatusError import should be at top of file with other imports
print(f"Failed to update document {current_doc_id}: {e}") | |
from httpx import HTTPStatusError | |
print(f"Failed to update document {current_doc_id}: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
print(f"Failed to update document {current_doc_id}: {e}") | ||
from httpx import HTTPStatusError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
2509bd1
to
90aae5f
Compare
* fixed id extraction in drive connector * WIP migration * full migration script * migration works single tenant without duplicates * tested single tenant with duplicate docs * migrations and frontend * tested mutlitenant * fix connector tests * make tests pass
Description
Fixes https://linear.app/danswer/issue/DAN-2162/drive-doc-deduplication
dupe drive ids fixed
How Has This Been Tested?
tested in ui + multitenant
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.