Skip to content

Conversation

rkuo-danswer
Copy link
Contributor

Description

image

Fixes https://linear.app/danswer/issue/DAN-1923/fix-binary-data-in-filesourcecard

How Has This Been Tested?

[Describe the tests you ran to verify your changes]

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

Copy link

vercel bot commented Apr 29, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 30, 2025 0:00am

Copy link
Contributor

@raunakab raunakab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small comment on why ind = 0 was necessary in prune_and_merge.py.

Other than the main bug, looks like the changes are primarily the addition of docs, moving some variables around here and there, and some formatting changes.

@rkuo-danswer rkuo-danswer marked this pull request as ready for review April 29, 2025 20:34
@rkuo-danswer rkuo-danswer requested a review from a team as a code owner April 29, 2025 20:34
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR focuses on improving binary data handling in file sources, with several key changes:

  • Changed error handling in create_search_doc_from_user_file from 'replace' to 'strict' mode for better binary content detection in chat.py
  • Renamed and refactored file loading functions in utils.py for clarity: load_all_user_filesload_in_memory_chat_files and load_all_user_file_filesget_user_files
  • Added optimizations in process_message.py for handling large user files with a new 'fast path' for small files
  • Moved RECENT_DOCS_FOLDER_ID constant from user_documents/api.py to chat_backend.py for better code organization

The changes improve error handling for binary files while making the codebase more maintainable through better function naming and documentation.

6 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines +182 to 183
# 1. Load files specified by individual IDs
[(load_user_file, (file_id, db_session)) for file_id in user_file_ids]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: run_functions_tuples_in_parallel is called with a list comprehension but the + operator is outside the cast, which could cause type issues if the second part returns an unexpected type

Comment on lines 228 to 232
user_files.extend(
db_session.query(UserFile)
.filter(UserFile.folder_id == user_folder_id)
.all()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: consider using a single query with an IN clause instead of multiple queries in a loop for better performance

Suggested change
user_files.extend(
db_session.query(UserFile)
.filter(UserFile.folder_id == user_folder_id)
.all()
)
user_files.extend(
db_session.query(UserFile)
.filter(UserFile.folder_id.in_(user_folder_ids))
.all()
)

@rkuo-danswer rkuo-danswer added this pull request to the merge queue Apr 30, 2025
Merged via the queue into main with commit 94de23f Apr 30, 2025
11 checks passed
@rkuo-danswer rkuo-danswer deleted the bugfix/chat-images-2 branch April 30, 2025 02:25
AnkitTukatek pushed a commit to TukaTek/onyx that referenced this pull request Sep 23, 2025
* don't hardcode -1

* extra spaces

* fix binary data in blurb

* add note to binary handling

---------

Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants