-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Bugfix/chat images 2 #4630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/chat images 2 #4630
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small comment on why ind = 0
was necessary in prune_and_merge.py
.
Other than the main bug, looks like the changes are primarily the addition of docs, moving some variables around here and there, and some formatting changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR focuses on improving binary data handling in file sources, with several key changes:
- Changed error handling in
create_search_doc_from_user_file
from 'replace' to 'strict' mode for better binary content detection inchat.py
- Renamed and refactored file loading functions in
utils.py
for clarity:load_all_user_files
→load_in_memory_chat_files
andload_all_user_file_files
→get_user_files
- Added optimizations in
process_message.py
for handling large user files with a new 'fast path' for small files - Moved
RECENT_DOCS_FOLDER_ID
constant fromuser_documents/api.py
tochat_backend.py
for better code organization
The changes improve error handling for binary files while making the codebase more maintainable through better function naming and documentation.
6 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile
# 1. Load files specified by individual IDs | ||
[(load_user_file, (file_id, db_session)) for file_id in user_file_ids] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: run_functions_tuples_in_parallel is called with a list comprehension but the + operator is outside the cast, which could cause type issues if the second part returns an unexpected type
user_files.extend( | ||
db_session.query(UserFile) | ||
.filter(UserFile.folder_id == user_folder_id) | ||
.all() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: consider using a single query with an IN clause instead of multiple queries in a loop for better performance
user_files.extend( | |
db_session.query(UserFile) | |
.filter(UserFile.folder_id == user_folder_id) | |
.all() | |
) | |
user_files.extend( | |
db_session.query(UserFile) | |
.filter(UserFile.folder_id.in_(user_folder_ids)) | |
.all() | |
) |
* don't hardcode -1 * extra spaces * fix binary data in blurb * add note to binary handling --------- Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Description
Fixes https://linear.app/danswer/issue/DAN-1923/fix-binary-data-in-filesourcecard
How Has This Been Tested?
[Describe the tests you ran to verify your changes]
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.