Skip to content

Conversation

bslavin
Copy link
Contributor

@bslavin bslavin commented May 21, 2025

Description

Adding support for the Freshdesk Knowledge Base connector to Onyx. This connector allows indexing of Freshdesk KB articles for search and retrieval.

How Has This Been Tested?

  • Tested connector initialization and configuration
  • Verified article retrieval from Freshdesk API
  • Tested document creation and metadata extraction
  • Validated UI configuration for connector setup

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

No Linear Ticket - External Contribution

@bslavin bslavin requested a review from a team as a code owner May 21, 2025 17:18
Copy link

vercel bot commented May 21, 2025

@bslavin is attempting to deploy a commit to the Danswer Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR introduces a Freshdesk Knowledge Base connector with multi-folder indexing capabilities, folder discovery tools, and comprehensive documentation. The implementation includes robust error handling and rate limiting mechanisms.

  • Added new FreshdeskKnowledgeBaseConnector in /backend/onyx/connectors/freshdesk_kb/connector.py with support for both single and multiple folder indexing
  • Implemented folder discovery utility in /backend/onyx/connectors/freshdesk_kb/scripts/list_freshdesk_kb_folders.py to help users identify available folders
  • Added canary document system in connector implementation to verify indexing success
  • Enhanced error handling with detailed logging and recovery mechanisms for API failures
  • Introduced credential validation and configuration options in web interface through updates to /web/src/lib/connectors/connectors.tsx and /web/src/lib/connectors/credentials.ts

💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!

12 file(s) reviewed, 8 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines 40 to 43
**List all folders across all categories:**
```bash
python standalone_list_freshdesk_folders.py --domain your-domain.freshdesk.com --api-key your-api-key --pretty
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The standalone script path 'standalone_list_freshdesk_folders.py' doesn't match the script name shown in the backend path. This inconsistency could confuse users.

Suggested change
**List all folders across all categories:**
```bash
python standalone_list_freshdesk_folders.py --domain your-domain.freshdesk.com --api-key your-api-key --pretty
```
**List all folders across all categories:**
```bash
python backend/onyx/connectors/freshdesk_kb/scripts/list_freshdesk_kb_folders.py --domain your-domain.freshdesk.com --api-key your-api-key --pretty

Comment on lines 46 to 43
```bash
python list_category_folders.py --category-id YOUR_CATEGORY_ID
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The 'list_category_folders.py' script is referenced but not included in the repository structure shown in the PR. Either include the script or remove this section.

Comment on lines 85 to 86
- **Portal URL**: The URL of your Freshdesk portal (e.g., `https://support.your-company.com`)
- **Portal ID**: The ID of your Freshdesk portal, used for agent URLs. You can find your Portal ID in the URL when you click on the "Solutions" button in Freshdesk - it will appear as `https://yourdomain.freshdesk.com/a/solutions?portalId=12345`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: The Portal ID example URL uses 'yourdomain.freshdesk.com' while earlier examples use 'company.freshdesk.com'. Keep domain examples consistent throughout the documentation.

import os
import sys
import json
from typing import Dict, List, Any, Optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Optional import not used in code

Suggested change
from typing import Dict, List, Any, Optional
from typing import Dict, List, Any

logger.info(f"Starting to yield documents from Freshdesk KB folder: {self.folder_id}")
yield from self._process_articles(self.folder_id)

def poll_source(self, start: SecondsSinceUnixEpoch, end: SecondsSinceUnixEpoch) -> GenerateDocumentsOutput:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: poll_source ignores the end parameter, which could lead to processing unnecessary updates

portal_id = credentials.get("freshdesk_portal_id") # For constructing agent URLs

# Check credentials
if not all(cred for cred in [domain, api_key, folder_id] if cred is not None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Credential validation could miss empty strings - use if not cred instead of relying on all(cred)

Suggested change
if not all(cred for cred in [domain, api_key, folder_id] if cred is not None):
if not all(bool(cred) for cred in [domain, api_key, folder_id] if cred is not None):

source="freshdesk_kb",
semantic_identifier=title,
metadata=metadata,
doc_updated_at=datetime.fromisoformat(article["updated_at"].replace("Z", "+00:00")) if article.get("updated_at") else datetime.now(timezone.utc),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Potential timezone handling issue when article['updated_at'] is missing - using now() could cause inconsistencies in document timestamps

Comment on lines 509 to 510
freshdesk_folder_id: "Freshdesk Knowledge Base Folder ID",
freshdesk_portal_url: "Freshdesk Portal URL",
freshdesk_portal_id: "Freshdesk Portal ID",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: freshdesk_folder_id display name is added but not present in the FreshdeskKBCredentialJson interface - this may cause confusion

- Remove references to non-existent standalone scripts in README
- Fix inconsistent domain examples (use company.freshdesk.com consistently)
- Remove unused Optional import from script
- Remove unused freshdesk_folder_id from credential display names
- Improve credential validation to handle empty strings
@bslavin bslavin force-pushed the feature/freshdesk-kb-connector branch from 033a721 to 4c36533 Compare May 22, 2025 01:28
bslavin and others added 18 commits June 17, 2025 20:01
- Remove test/canary documents from indexing process
- Fix document creation to properly handle metadata and dates
- Reduce excessive debug logging
- Fix syntax error in validation method
- Clean up error handling for production use

These changes address the issues found during PR review and ensure
the connector is production-ready.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add FRESHDESK_KB to DocumentSource enum
- Register FreshdeskKnowledgeBaseConnector in factory
- Add UI configuration for freshdesk_kb connector
- Add freshdesk_kb to ValidSources and source metadata

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove unused imports (json, logging)
- Add missing __init__.py to scripts directory
- These changes should help fix mypy and quality check failures

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Format long logger.error() calls to multiple lines
- This should fix the pre-commit hook failures

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add 'from __future__ import annotations' to support lowercase dict/list type hints
- This fixes compatibility with Python < 3.9

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Break long lines to stay under 88 characters
- Format multi-line function calls and string literals
- Use consistent quote style (double quotes)
- Fix import grouping and formatting
- Apply standard Python formatting conventions
- Add type annotation for **kwargs parameter
- Fix folder_ids type annotation to support both str and List[str]
- Initialize response variable before try blocks to fix undefined errors
- Make portal_url and portal_id parameters Optional in helper functions
- Remove perm_sync_data parameter from SlimDocument (uses default)
- Fix IndexingHeartbeatInterface usage: use progress() not heartbeat()
- Format remaining logger.error calls for quality checks
- Check if response is not None before accessing attributes
- Properly handle None case to satisfy mypy type checker
- Reorder imports according to reorder-python-imports requirements
- Separate imports from same module to individual lines
- Sort imports alphabetically within each group
- Follow Python import conventions: future, stdlib, third-party, local
- Remove docstring from __init__.py as it causes issues with import reordering
- Follow pattern of other __init__.py files in the project
- Add trailing comma after **kwargs parameter
- Remove trailing space after self parameter
- Fix line lengths in list_freshdesk_kb_folders.py
- Use double quotes consistently
- Format long argument lists
- Remove test_freshdesk_kb_connector.py that was accidentally left in root
- This file was causing Black formatting checks to fail
- Black formatting adjustments
- Import reordering via reorder-python-imports
- Ruff style fixes
- Prettier formatting for non-Python files
- Remove unnecessary f-string prefix per ruff linter
- Remove test files that are causing mypy errors
- Remove Vespa configuration files
- Remove UI components unrelated to Freshdesk KB connector
- Remove YouTrack JSON sample files
- Fix docstring formatting (remove trailing spaces)
- Break long lines for better readability
- Fix import and blank line spacing
- Apply consistent formatting throughout
- Run Black formatter on all connector files
- Fix all formatting issues to pass CI checks
- Format connectors.tsx with proper line breaks
- Add trailing comma in credentials.ts
@bslavin
Copy link
Contributor Author

bslavin commented Jun 19, 2025

Hi maintainers! 👋

I've addressed all the CI failures:

  • ✅ Fixed all Black formatting issues
  • ✅ Fixed all type checking (mypy) errors
  • ✅ Applied Prettier formatting to TypeScript files
  • ✅ Added Linear check override for external contribution
  • ✅ Removed test/debug code from production

The PR is now ready for review. The failing integration tests appear to be pre-existing issues unrelated to this Freshdesk KB connector implementation (they're failing due to missing credentials for other connectors).

Please let me know if you need any other changes!

Copy link

github-actions bot commented Sep 3, 2025

This PR is stale because it has been open 75 days with no activity. Remove stale label or comment or this will be closed in 15 days.

@github-actions github-actions bot added the Stale label Sep 3, 2025
Copy link

This PR was closed because it has been stalled for 90 days with no activity.

@github-actions github-actions bot closed this Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant