Skip to content

Conversation

rkuo-danswer
Copy link
Contributor

Description

Fixes https://linear.app/danswer/issue/DAN-1977/mitigate-connector-timeout-in-salesforce-connector-when-initial

How Has This Been Tested?

[Describe the tests you ran to verify your changes]

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

Copy link

vercel bot commented May 14, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 14, 2025 9:24pm

@rkuo-danswer rkuo-danswer marked this pull request as ready for review May 14, 2025 21:21
@rkuo-danswer rkuo-danswer requested a review from a team as a code owner May 14, 2025 21:21
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR restructures the Salesforce connector to prevent timeouts during CSV processing and improves error message handling in the image summarization module. - Refactored _fetch_from_salesforce in backend/onyx/connectors/salesforce/connector.py to yield empty document lists during CSV processing, preventing connector timeouts - Added memory management with gc.collect() calls after batch processing in backend/onyx/connectors/salesforce/connector.py to address potential memory leaks - Implemented byte size tracking for document batches in backend/onyx/connectors/salesforce/connector.py to prevent memory issues - Added error message truncation to 1024 chars in backend/onyx/file_processing/image_summarization.py for better log management - Improved CSV file cleanup by removing processed files immediately in backend/onyx/connectors/salesforce/connector.py

2 file(s) reviewed, 3 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines +106 to +107
error_msg = f"Summarization failed. Messages: {messages}"
error_msg = error_msg[:1024]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider using a constant for the max error message length (1024) to make it configurable and maintain consistency

Comment on lines +246 to +250
new_ids = sf_db.update_from_csv(
object_type=object_type,
csv_download_path=csv_path,
)
updated_ids.update(new_ids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: no error handling around update_from_csv call - could silently fail and continue processing

Suggested change
new_ids = sf_db.update_from_csv(
object_type=object_type,
csv_download_path=csv_path,
)
updated_ids.update(new_ids)
try:
new_ids = sf_db.update_from_csv(
object_type=object_type,
csv_download_path=csv_path,
)
updated_ids.update(new_ids)
except Exception as e:
logger.error(f"Failed to update from CSV {csv_path}: {e}")
continue

f"Added {len(new_ids)} new/updated records for {object_type}"
)

os.remove(csv_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: file removal should be in a try/finally block to ensure cleanup even if update_from_csv fails

Suggested change
os.remove(csv_path)
try:
new_ids = sf_db.update_from_csv(
object_type=object_type,
csv_download_path=csv_path,
)
updated_ids.update(new_ids)
finally:
os.remove(csv_path)

@Weves Weves added this pull request to the merge queue May 15, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 15, 2025
@rkuo-danswer rkuo-danswer added this pull request to the merge queue May 15, 2025
Merged via the queue into main with commit a44f289 May 15, 2025
10 of 11 checks passed
@rkuo-danswer rkuo-danswer deleted the bugfix/salesforce-timeout branch May 15, 2025 06:19
ferdinandl007 pushed a commit to ferdinandl007/onyx that referenced this pull request May 19, 2025
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
aronszanto pushed a commit to aronszanto/onyx that referenced this pull request May 27, 2025
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
ZhipengHe pushed a commit to ZhipengHe/onyx that referenced this pull request Jun 6, 2025
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
AnkitTukatek pushed a commit to TukaTek/onyx that referenced this pull request Sep 23, 2025
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants