Skip to content

Conversation

evan-onyx
Copy link
Contributor

Description

Fixes https://linear.app/danswer/issue/DAN-1836/fix-file-connector-metadata

How Has This Been Tested?

Tested in UI, daily connector test in the works

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

@evan-onyx evan-onyx requested a review from a team as a code owner April 17, 2025 21:37
Copy link

vercel bot commented Apr 17, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 17, 2025 9:38pm

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR introduces metadata handling for file connectors, focusing on ZIP file metadata extraction and storage through a new zip_metadata field across various components.

  • Renames DANSWER_METADATA_FILENAME to ONYX_METADATA_FILENAME for consistency, but removes support for single metadata file in ZIP archives which could impact existing functionality
  • Adds empty zip_metadata dictionary initialization in connector configurations but lacks actual implementation in user files, suggesting incomplete feature rollout
  • Restricts file connectors to one ZIP file per connector which could be limiting for some use cases
  • Metadata extraction could fail silently in edge cases (corrupted ZIPs) and lacks schema validation beyond JSON format
  • Inconsistency between default configuration in create() method and DATestConnector constructor could cause test issues

15 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines +295 to +297
return self.zip_metadata.get(file_name, {}) or self.zip_metadata.get(
os.path.basename(file_name), {}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: The fallback to basename lookup could mask issues if there are files with the same name in different directories. Consider logging when falling back to basename lookup.


class FileUploadResponse(BaseModel):
file_paths: list[str]
zip_metadata: dict[str, Any]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider adding a default value of empty dict to avoid potential None issues in consuming code

Suggested change
zip_metadata: dict[str, Any]
zip_metadata: dict[str, Any] = Field(default_factory=dict)



def _read_files_and_metadata(
def _read_file_from_postgres(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer not naming it this, since it might not be postgres soon 🙏

E.g. _read_file_from_filestore

@evan-onyx evan-onyx added this pull request to the merge queue Apr 17, 2025
Merged via the queue into main with commit 953a4e3 Apr 17, 2025
10 of 11 checks passed
@evan-onyx evan-onyx deleted the file-connector-metadata branch April 17, 2025 23:55
aronszanto pushed a commit to aronszanto/onyx that referenced this pull request Apr 26, 2025
AnkitTukatek pushed a commit to TukaTek/onyx that referenced this pull request Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants