-
Notifications
You must be signed in to change notification settings - Fork 2.1k
v1 file connector with metadata #4552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR introduces metadata handling for file connectors, focusing on ZIP file metadata extraction and storage through a new zip_metadata
field across various components.
- Renames
DANSWER_METADATA_FILENAME
toONYX_METADATA_FILENAME
for consistency, but removes support for single metadata file in ZIP archives which could impact existing functionality - Adds empty
zip_metadata
dictionary initialization in connector configurations but lacks actual implementation in user files, suggesting incomplete feature rollout - Restricts file connectors to one ZIP file per connector which could be limiting for some use cases
- Metadata extraction could fail silently in edge cases (corrupted ZIPs) and lacks schema validation beyond JSON format
- Inconsistency between default configuration in
create()
method andDATestConnector
constructor could cause test issues
15 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile
return self.zip_metadata.get(file_name, {}) or self.zip_metadata.get( | ||
os.path.basename(file_name), {} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: The fallback to basename lookup could mask issues if there are files with the same name in different directories. Consider logging when falling back to basename lookup.
|
||
class FileUploadResponse(BaseModel): | ||
file_paths: list[str] | ||
zip_metadata: dict[str, Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Consider adding a default value of empty dict to avoid potential None issues in consuming code
zip_metadata: dict[str, Any] | |
zip_metadata: dict[str, Any] = Field(default_factory=dict) |
|
||
|
||
def _read_files_and_metadata( | ||
def _read_file_from_postgres( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefer not naming it this, since it might not be postgres soon 🙏
E.g. _read_file_from_filestore
Description
Fixes https://linear.app/danswer/issue/DAN-1836/fix-file-connector-metadata
How Has This Been Tested?
Tested in UI, daily connector test in the works
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.