Skip to content

Conversation

evan-onyx
Copy link
Contributor

@evan-onyx evan-onyx commented Apr 18, 2025

Description

Addresses https://linear.app/danswer/issue/DAN-1848/confluence-churning-forever-without-checkpointing
Address Behavior we've seen where a connector appears to just continuously get attachments. We now return a checkpoint after the outer for loop has attempted to process a batch of documents, whether they had errors or not. We also log when we start processing documents. This is intended to help with some 403s on attachment retrieval we've been seeing by surfacing the issue faster.

How Has This Been Tested?

tests tbd

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

@evan-onyx evan-onyx requested a review from a team as a code owner April 18, 2025 20:40
Copy link

vercel bot commented Apr 18, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 18, 2025 11:44pm

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR modifies the Confluence connector to address an issue with continuous attachment retrieval and improve error visibility.

  • Added logging at document processing start to surface attachment retrieval issues faster
  • Modified checkpoint logic to return after batch processing regardless of document errors
  • Renamed doc_count to yield_count to accurately reflect tracking of both documents and errors
  • Added comprehensive test coverage in /backend/tests/unit/onyx/connectors/confluence/test_confluence_checkpointing.py to verify checkpoint behavior
  • Improved error handling for attachment processing in _fetch_page_attachments method

The changes follow the repository's principles of failing loudly and providing clear error visibility while maintaining strict typing and clear logical boundaries.

1 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile

@Weves Weves enabled auto-merge April 19, 2025 01:58
@Weves Weves added this pull request to the merge queue Apr 19, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Apr 19, 2025
@Weves Weves merged commit 5681df9 into main Apr 19, 2025
10 of 12 checks passed
@Weves Weves deleted the confluence-attachment-403 branch April 19, 2025 22:53
aronszanto pushed a commit to aronszanto/onyx that referenced this pull request Apr 26, 2025
* address getting attachments forever

* fix unit tests
AnkitTukatek pushed a commit to TukaTek/onyx that referenced this pull request Sep 23, 2025
* address getting attachments forever

* fix unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants