Skip to content

Conversation

nsklei
Copy link
Contributor

@nsklei nsklei commented Aug 11, 2025

The Feature allows users to choose whether they want to index only SharePoint pages, only SharePoint documents, or both.
The PR adds the backend logic to the sharepoint connector.py and the necessary UI elements to the connectors.tsx.

image

How Has This Been Tested?

include_site_pages: bool = True, include_site_documents: bool = True
include_site_pages: bool = True, include_site_documents: bool = False
include_site_pages: bool = False, include_site_documents: bool = True## Description

I tested the UI elements and its functionality manually by building the images locally and starting the application.
I created three Sharepoint connectors with permutation of all three options (see below) and tested them in an assistant.

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

Summary by cubic

Added options to let users choose whether to index SharePoint documents, SharePoint pages, or both when setting up a connector.

  • New Features
  • Added checkboxes in the UI for selecting documents and/or pages.
  • Updated backend logic to support these options.

@nsklei nsklei requested a review from a team as a code owner August 11, 2025 15:03
Copy link

vercel bot commented Aug 11, 2025

Someone is attempting to deploy a commit to the Danswer Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR enhances the SharePoint connector to provide granular control over content indexing by adding a new include_site_documents parameter alongside the existing include_site_pages parameter. Previously, the SharePoint connector would always index documents from SharePoint document libraries, with only site pages being optional. Now users can independently control whether to index SharePoint documents and/or SharePoint site pages (.aspx files).

The implementation adds the include_site_documents: bool = True parameter to the SharepointConnector constructor, maintaining backward compatibility by defaulting to True. The document processing logic in _fetch_from_sharepoint() is now wrapped in a conditional check (if self.include_site_documents:) that controls whether drive items are fetched and processed into documents. This follows the same pattern already established for site pages with the existing include_site_pages parameter.

The frontend changes add corresponding UI elements in the connector configuration form, providing two checkboxes for "Index Documents" and "Index ASPX Sites" in an advanced configuration section. Both options default to enabled, preserving existing behavior while giving users the flexibility to selectively index content types based on their specific needs.

The test suite is updated to explicitly specify both parameters in SharePoint connector instantiation, ensuring tests continue to work correctly with the expanded API. This change integrates well with the existing connector architecture and maintains the established patterns for optional content selection in Onyx connectors.

Confidence score: 5/5

  • This PR is safe to merge with minimal risk as it adds optional functionality with sensible defaults
  • Score reflects simple, well-contained changes that follow existing patterns and maintain backward compatibility
  • No files require special attention as the changes are straightforward and properly tested

3 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

@wenxi-onyx
Copy link
Member

Hey @nsklei - Looking at #5173, I think some commits were lost when you swapped over to PR from a new branch. Would you mind taking a look at this?

@nsklei
Copy link
Contributor Author

nsklei commented Aug 11, 2025

Hey @wenxi-onyx - Looks good from my side. Originally it were just one commit with 3 files changed.
The second commit in #5173 was just the unintended result of syncing the main branch of my fork.

nsklei pushed a commit to nsklei/onyx that referenced this pull request Aug 14, 2025
@nsklei nsklei force-pushed the feat/sharepoint-content-selection branch from cbff29a to 29f4d89 Compare August 14, 2025 16:20
@wenxi-onyx
Copy link
Member

wenxi-onyx commented Aug 14, 2025

Hey @nsklei, would you mind fixing one last issue I found in this connector (not caused by you)?

Could you please add

# Exclude personal sites because GET personal site pages returns 404
if "-my.sharepoint" in site_descriptor.url:
   return []

At the start of _fetch_site_pages()
https://github.com/onyx-dot-app/onyx/pull/5183/files#diff-0e02c6711532c1b02c5f40fe38179e67d7bdb918fc826c6f3e1a1c6017ae14e0R794

Thank you so much!!

@nsklei nsklei force-pushed the feat/sharepoint-content-selection branch from 52c5757 to 6d80fa2 Compare August 14, 2025 21:39
@wenxi-onyx wenxi-onyx merged commit a605bd4 into onyx-dot-app:main Aug 14, 2025
7 of 13 checks passed
@nsklei nsklei deleted the feat/sharepoint-content-selection branch September 5, 2025 19:46
AnkitTukatek pushed a commit to TukaTek/onyx that referenced this pull request Sep 23, 2025
…ot-app#5183)

* feat: make sharepoint documents and sharepoint pages optional

* fix: address review feedback for PR onyx-dot-app#5183

* fix: exclude personal sites from sharepoint connector

---------

Co-authored-by: Nils Kleinrahm <nils.kleinrahm@pledoc.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants