Skip to content

Conversation

wenxi-onyx
Copy link
Member

@wenxi-onyx wenxi-onyx commented Jun 4, 2025

Description

Sharepoint connector didn't correctly list sites in organization

Fixes https://linear.app/danswer/issue/DAN-2044/sharepoint-indexing-bug

How Has This Been Tested?

Created Sharepoint connector with no sites specified

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

@wenxi-onyx wenxi-onyx requested a review from a team as a code owner June 4, 2025 00:46
Copy link

vercel bot commented Jun 4, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 5, 2025 6:20pm

@wenxi-onyx wenxi-onyx changed the title Fixed indexing when no sites are specificed Fixed indexing when no sites are specified Jun 4, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Fixed SharePoint connector's site listing functionality to properly discover and index all available sites when no specific sites are provided in the configuration.

  • Modified _fetch_sites method in backend/onyx/connectors/sharepoint/connector.py to correctly iterate through sites.current_page instead of using a single resource_url
  • Improved site discovery by using each site's web_url for building site descriptors, enabling comprehensive organization-wide indexing
  • Added support for automatic site discovery when no sites are explicitly specified in the connector configuration

💡 (2/5) Greptile learns from your feedback when you react with 👍/👎!

1 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines 231 to 241
sites = self.graph_client.sites.get_all().execute_query()
site_descriptors = [
SiteDescriptor(
url=sites.resource_url,
drive_name=None,
folder_path=None,
site_descriptors = []
for site in sites.current_page:
site_descriptors.append(
SiteDescriptor(
url=site.web_url,
drive_name=None,
folder_path=None,
)
)
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Consider handling pagination here - current_page suggests there might be more pages of sites that aren't being fetched

Suggested change
sites = self.graph_client.sites.get_all().execute_query()
site_descriptors = [
SiteDescriptor(
url=sites.resource_url,
drive_name=None,
folder_path=None,
site_descriptors = []
for site in sites.current_page:
site_descriptors.append(
SiteDescriptor(
url=site.web_url,
drive_name=None,
folder_path=None,
)
)
]
sites = self.graph_client.sites.get_all().execute_query()
site_descriptors = []
while True:
for site in sites.current_page:
site_descriptors.append(
SiteDescriptor(
url=site.web_url,
drive_name=None,
folder_path=None,
)
)
if not sites.has_next:
break
sites.get_next().execute_query()

Copy link
Contributor

@Weves Weves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenxi-onyx can we add a test to the test_sharepoint_connector.py file that checks this case?

@wenxi-onyx
Copy link
Member Author

@Weves Added test - note that it only checks if any docs are retrieved. Does not assert "expected sites" because number of sites in our tenant may change at any time.

@wenxi-onyx wenxi-onyx added this pull request to the merge queue Jun 5, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 5, 2025
@Weves Weves added this pull request to the merge queue Jun 5, 2025
Merged via the queue into main with commit dc4b9bc Jun 6, 2025
11 checks passed
@Weves Weves deleted the sharepoint-indexing-fix branch June 6, 2025 00:20
ZhipengHe pushed a commit to ZhipengHe/onyx that referenced this pull request Jun 6, 2025
* Fixed indexing when no sites are specificed

* Added test for Sharepoint all sites index

* Accounted for paginated results.

* Typing

* Typing

---------

Co-authored-by: Wenxi Onyx <wenxi-onyx@Wenxis-MacBook-Pro.local>
AnkitTukatek pushed a commit to TukaTek/onyx that referenced this pull request Sep 23, 2025
* Fixed indexing when no sites are specificed

* Added test for Sharepoint all sites index

* Accounted for paginated results.

* Typing

* Typing

---------

Co-authored-by: Wenxi Onyx <wenxi-onyx@Wenxis-MacBook-Pro.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants