Skip to content

Conversation

suvodhoy
Copy link
Contributor

Description

When accessing a Discourse category via <baseUrl>/c/<categoryId>.json?page={pageNumber}&sys=latest, Discourse issues a redirect to <baseUrl>/c/<categorySlug>/<categoryId>.json?..... During this redirect, any numeric occurrence matching the categoryId in the query string is also rewritten - including the page parameter - causing incorrect behavior (e.g., page=5 becoming page=community/5).

This PR updates the connector logic to use the correct category url to avoid the redirection ensuring that the query parameters remain intact.

How to reproduce?

Try opening this link https://meta.discourse.org/c/10.json?page=10&sys=latest in the browser. It will throw a 400 Bad Request error. When we inspect the url, we will notice the page number has been replaced by the slug/categoryId.

How Has This Been Tested?

The issue was found when I was trying to sync my own discourse setup. I tested the changes by building the background service locally and running the sync.

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

@suvodhoy suvodhoy requested a review from a team as a code owner May 28, 2025 15:09
Copy link

vercel bot commented May 28, 2025

@suvodhoy is attempting to deploy a commit to the Danswer Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Fixed a URL redirect issue in the Discourse connector by properly constructing category URLs with slugs to prevent query parameter corruption during redirects.

  • Modified category_id_map in /backend/onyx/connectors/discourse/connector.py to store both category name and slug information
  • Updated URL construction in _get_latest_topics() to use f"c/{category_dict['slug']}/{category_id}.json" format, preventing parameter corruption
  • Added validation for empty categories to remove them from category_id_map when no topics are found

💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!

1 file(s) reviewed, no comment(s)
Edit PR Review Bot Settings | Greptile

@suvodhoy
Copy link
Contributor Author

Raised a fix in the discourse repo as well.

@suvodhoy
Copy link
Contributor Author

Hi @Weves, can you review this PR?

@wenxi-onyx
Copy link
Member

@suvodhoy Good catch on this one! Since this is fixed in Discourse, is this PR still applicable for Onyx?

@suvodhoy
Copy link
Contributor Author

Hi @wenxi-onyx, although this issue will be fixed in the latest versions of Discourse, some users may still be on older versions where the bug exists. As a result, they could continue facing issues during data sync. Hence I'll recommend handling this in Onyx as well to ensure broader compatibility.

@wenxi-onyx wenxi-onyx merged commit accd363 into onyx-dot-app:main Jun 19, 2025
3 of 9 checks passed
AnkitTukatek pushed a commit to TukaTek/onyx that referenced this pull request Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants