Skip to content

Conversation

wenxi-onyx
Copy link
Member

@wenxi-onyx wenxi-onyx commented Aug 12, 2025

Description

  • title

How Has This Been Tested?

-locally

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

Summary by cubic

Fixed seeded document count by marking documents as indexed during the seeding process to ensure accurate totals.

@wenxi-onyx wenxi-onyx requested a review from a team as a code owner August 12, 2025 20:44
Copy link

vercel bot commented Aug 12, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Project Deployment Preview Comments Updated (UTC)
internal-search Ready Preview Comment Aug 12, 2025 8:50pm

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR fixes a bug in the document seeding flow by adding a call to mark_document_as_indexed_for_cc_pair__no_commit in the seed_initial_documents function. The change ensures that seeded documents are properly marked as indexed in the DocumentByConnectorCredentialPair table after being indexed into Vespa.

The issue was that the seeding process was successfully indexing documents into the Vespa search engine and updating their chunk counts in the database, but it wasn't marking these documents as indexed in the relationship table that tracks which documents have been processed for each connector-credential pair. This created an inconsistent state where documents were searchable but the system didn't properly track their indexed status.

The fix is strategically placed after the Vespa indexing operation but before the mock index attempt creation, ensuring the database state accurately reflects the indexed documents while maintaining transactional consistency by using the __no_commit variant of the function.

Confidence score: 4/5

  • This PR is safe to merge with minimal risk of breaking existing functionality
  • Score reflects a straightforward bug fix that addresses a clear state inconsistency issue without modifying critical logic flows
  • Pay close attention to the placement of the new function call to ensure it maintains proper transactional boundaries

1 file reviewed, no comments

Edit Code Review Bot Settings | Greptile

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

No issues found across 1 file. Review in cubic

@Weves Weves enabled auto-merge August 12, 2025 20:53
@Weves Weves added this pull request to the merge queue Aug 13, 2025
Merged via the queue into main with commit 55dc24f Aug 13, 2025
16 of 18 checks passed
@Weves Weves deleted the bugfix/seeded_total_docs branch August 13, 2025 01:32
AnkitTukatek pushed a commit to TukaTek/onyx that referenced this pull request Sep 23, 2025
* fix seeded total doc count

* fix seeded total doc count
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants