-
Notifications
You must be signed in to change notification settings - Fork 2.1k
restructure to signal activity while processing #4712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR restructures the Salesforce connector to prevent timeouts during CSV processing and improves error message handling in the image summarization module. - Refactored _fetch_from_salesforce
in backend/onyx/connectors/salesforce/connector.py
to yield empty document lists during CSV processing, preventing connector timeouts - Added memory management with gc.collect()
calls after batch processing in backend/onyx/connectors/salesforce/connector.py
to address potential memory leaks - Implemented byte size tracking for document batches in backend/onyx/connectors/salesforce/connector.py
to prevent memory issues - Added error message truncation to 1024 chars in backend/onyx/file_processing/image_summarization.py
for better log management - Improved CSV file cleanup by removing processed files immediately in backend/onyx/connectors/salesforce/connector.py
2 file(s) reviewed, 3 comment(s)
Edit PR Review Bot Settings | Greptile
error_msg = f"Summarization failed. Messages: {messages}" | ||
error_msg = error_msg[:1024] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Consider using a constant for the max error message length (1024) to make it configurable and maintain consistency
new_ids = sf_db.update_from_csv( | ||
object_type=object_type, | ||
csv_download_path=csv_path, | ||
) | ||
updated_ids.update(new_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: no error handling around update_from_csv call - could silently fail and continue processing
new_ids = sf_db.update_from_csv( | |
object_type=object_type, | |
csv_download_path=csv_path, | |
) | |
updated_ids.update(new_ids) | |
try: | |
new_ids = sf_db.update_from_csv( | |
object_type=object_type, | |
csv_download_path=csv_path, | |
) | |
updated_ids.update(new_ids) | |
except Exception as e: | |
logger.error(f"Failed to update from CSV {csv_path}: {e}") | |
continue |
f"Added {len(new_ids)} new/updated records for {object_type}" | ||
) | ||
|
||
os.remove(csv_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: file removal should be in a try/finally block to ensure cleanup even if update_from_csv fails
os.remove(csv_path) | |
try: | |
new_ids = sf_db.update_from_csv( | |
object_type=object_type, | |
csv_download_path=csv_path, | |
) | |
updated_ids.update(new_ids) | |
finally: | |
os.remove(csv_path) |
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Description
Fixes https://linear.app/danswer/issue/DAN-1977/mitigate-connector-timeout-in-salesforce-connector-when-initial
How Has This Been Tested?
[Describe the tests you ran to verify your changes]
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.