-
Notifications
You must be signed in to change notification settings - Fork 2k
feat: add GitHub Pages connector #5378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add GitHub Pages connector #5378
Conversation
Someone is attempting to deploy a commit to the Danswer Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR adds a comprehensive GitHub Pages connector to Onyx that enables indexing of GitHub Pages websites through GitHub's API rather than web scraping. The implementation follows established patterns by creating a new GithubPagesConnector
class that extends both LoadConnector
and CheckpointedConnector
interfaces.
Backend Changes:
- New connector implementation (
backend/onyx/connectors/github_pages/connector.py
): The main connector class fetches source files directly from GitHub repositories using the GitHub API, processes various file types (HTML, Markdown, etc.), and creates documents with GitHub Pages-style URLs. It intelligently handles scenarios where GitHub Pages is enabled (discovering published URLs) and falls back to processing source files directly when it's not. - Constants and factory integration (
backend/onyx/configs/constants.py
,backend/onyx/connectors/factory.py
): AddedGITHUB_PAGES
enum toDocumentSource
and integrated the connector into the factory mapping system. - Slack integration (
backend/onyx/onyxbot/slack/icons.py
): Added icon mapping for the new source type to maintain consistency in Slack bot displays.
Frontend Changes:
- UI components: Added
GithubPagesIcon
component reusing the existing GitHub icon for visual consistency. - Type definitions (
web/src/lib/types.ts
): AddedGitHubPages
toValidSources
enum and included it invalidAutoSyncSources
for automatic synchronization support. - Source metadata (
web/src/lib/sources.ts
): Added GitHub Pages to the source metadata mapping under the CodeRepository category. - Connector configuration (
web/src/lib/connectors/connectors.tsx
): Implemented comprehensive form configuration with required fields for repository owner/name and optional README inclusion setting, along with TypeScript interface definition. - Credentials setup (
web/src/lib/connectors/credentials.ts
): Configured credential template reusing the existingGithubCredentialJson
interface.
The connector addresses use cases where GitHub Pages sites are behind authentication or firewalls by accessing source files through the authenticated GitHub API. It includes proper error handling, rate limiting, checkpointing, and follows the established connector patterns throughout the Onyx codebase.
Confidence score: 4/5
- This PR is safe to merge with moderate confidence, requiring standard review attention
- Score reflects comprehensive implementation following established patterns, though some error handling could be more specific
- Pay closer attention to the main connector implementation file for error handling and the factory integration
10 files reviewed, no comments
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 10 files
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai
to give feedback, ask questions, or re-run the review.
@Weves Open to feedback, appreciate you looking into this. I am not sure whether this PR covers all the requirements so I might need some assistance. |
GitHub Pages connector
Description
This PR introduces a new GitHub Pages connector and integrates it into both the backend and frontend of Onyx.
Test
Demo
Related Issue / Claim
Closes #2282
Creating a GitHub PAT for the GitHub Pages connector
Onyx GitHub Pages
No expiration
(recommended for connectors)All repositories
(or select specific repos)Contents → Read-only
Metadata → Read-only
Using the token in Onyx
repo_owner
(e.g.melmathari
)repo_name
(e.g.GitHub-pages
)/claim #2282
Summary by cubic
Adds a GitHub Pages connector that indexes HTML/Markdown from a repo’s Pages site via the GitHub API and exposes it as a load-state connector in the app. Implements the flow requested in Linear #2282.
New Features
Frontend