Skip to content

Comments

fix: prevent search_code and index_codebase from crashing MCP process#72

Open
ProxiBlue wants to merge 1 commit intopleaseai:mainfrom
ProxiBlue:fix/search-cloud-sync-crash
Open

fix: prevent search_code and index_codebase from crashing MCP process#72
ProxiBlue wants to merge 1 commit intopleaseai:mainfrom
ProxiBlue:fix/search-cloud-sync-crash

Conversation

@ProxiBlue
Copy link
Contributor

Disclosure: This PR was generated with assistance from Claude Code (Anthropic). My TypeScript knowledge is limited — thorough review is appreciated.

Summary

  • Replaces blocking await this.syncIndexedCodebasesFromCloud() in handleSearchCode and handleIndexCodebase with a non-blocking fire-and-forget wrapper
  • Adds a 10-second timeout guard so a hung ensureLoaded() call on a remote Milvus instance cannot kill the MCP process
  • Adds single-flight protection to prevent overlapping sync operations

Fixes #71

Problem

search_code and index_codebase both await syncIndexedCodebasesFromCloud() before processing the request. This method iterates all collections in the vector database and calls ensureLoaded() on each one. On a remote or shared Milvus/Zilliz instance, ensureLoaded() can hang indefinitely (the gRPC client has no default timeout), which blocks the handler, causes the stdio transport to drop, and crashes the MCP process with socket hang up.

get_indexing_status and clear_index do not call this method and work correctly — confirming the sync call is the crash point.

This is also redundant work — SyncManager.startBackgroundSync() already performs cloud sync on MCP server startup.

Changes

packages/mcp/src/handlers.ts

  1. Added nonBlockingCloudSync() method:

    • Fire-and-forget (does not return a Promise to the caller)
    • Promise.race with a 10-second timeout — if sync doesn't complete in time, it's abandoned gracefully
    • Single-flight flag (cloudSyncInFlight) prevents overlapping syncs
    • Errors are caught and logged as warnings, never propagated to the handler
  2. Replaced both await this.syncIndexedCodebasesFromCloud() calls with this.nonBlockingCloudSync()

Test plan

  • Configure with a remote Milvus instance (MILVUS_ADDRESS=x.x.x.x:19530)
  • Call search_code — should return results or "not indexed" message without crashing
  • Call index_codebase — should start indexing without crashing
  • Verify startup sync via SyncManager still runs on process start
  • Verify that rapid successive search_code calls don't trigger overlapping syncs

🤖 Generated with Claude Code

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 2, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ProxiBlue, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the stability and responsiveness of the MCP process by addressing a critical issue where interactions with remote vector databases could lead to application crashes. By refactoring the cloud synchronization logic to be non-blocking, incorporating a timeout mechanism, and ensuring single-flight execution, the system can now handle potentially slow or unresponsive external services without impacting core functionality or user experience.

Highlights

  • Preventing MCP Process Crashes: Replaced blocking await this.syncIndexedCodebasesFromCloud() calls in handleSearchCode and handleIndexCodebase with a new non-blocking, fire-and-forget mechanism to prevent the MCP process from crashing due to hung ensureLoaded() calls on remote Milvus instances.
  • Timeout Guard for Cloud Sync: Introduced a 10-second timeout for the non-blocking cloud synchronization, ensuring that if the sync operation takes too long, it is gracefully abandoned without blocking the main process.
  • Single-Flight Protection: Implemented single-flight protection for cloud sync operations, preventing multiple concurrent synchronization attempts and ensuring that only one sync runs at a time.
  • Error Handling: Errors during the non-blocking cloud sync are now caught and logged as warnings, preventing them from propagating and affecting the main application flow.
Changelog
  • packages/mcp/src/handlers.ts
    • Added a new private method nonBlockingCloudSync() which encapsulates the fire-and-forget cloud synchronization logic, including timeout and single-flight protection.
    • Replaced direct await this.syncIndexedCodebasesFromCloud() calls with this.nonBlockingCloudSync() in both handleIndexCodebase and handleSearchCode methods.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly identifies and resolves a critical issue where a hanging network call could crash the main process. The solution to use a fire-and-forget, single-flight, and timeout-guarded function is a robust approach. The implementation is clean and effectively decouples the long-running sync operation from the request handlers. My review includes one suggestion to prevent a minor resource leak from dangling timers. Overall, this is a solid fix for a potentially serious stability problem.

Comment on lines 39 to 54
this.cloudSyncInFlight = true

const timeout = new Promise<void>((_, reject) =>
setTimeout(
() => reject(new Error('Cloud sync timed out')),
ToolHandlers.CLOUD_SYNC_TIMEOUT_MS,
),
)

Promise.race([this.syncIndexedCodebasesFromCloud(), timeout])
.catch((err) => {
console.warn(`[SYNC-CLOUD] Non-blocking cloud sync failed: ${err.message || err}`)
})
.finally(() => {
this.cloudSyncInFlight = false
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a minor resource leak here. If syncIndexedCodebasesFromCloud() completes or fails before the timeout, the timer created by setTimeout is not cleared and will continue to run until it elapses. In a long-running process, this could lead to an accumulation of dangling timers. It's good practice to clear the timer in the finally block to prevent this.

    this.cloudSyncInFlight = true;

    let timeoutId: NodeJS.Timeout;
    const timeout = new Promise<void>((_, reject) => {
      timeoutId = setTimeout(
        () => reject(new Error('Cloud sync timed out')),
        ToolHandlers.CLOUD_SYNC_TIMEOUT_MS,
      );
    });

    Promise.race([this.syncIndexedCodebasesFromCloud(), timeout])
      .catch((err) => {
        console.warn(`[SYNC-CLOUD] Non-blocking cloud sync failed: ${err.message || err}`);
      })
      .finally(() => {
        clearTimeout(timeoutId);
        this.cloudSyncInFlight = false;
      });

@dosubot dosubot bot added the type: bug Something isn't working label Feb 3, 2026
@ProxiBlue ProxiBlue force-pushed the fix/search-cloud-sync-crash branch from 871a27a to f193d78 Compare February 3, 2026 00:02
@ProxiBlue
Copy link
Contributor Author

@gemini-code-assist please re-review — the clearTimeout fix from your review has been applied in the updated commit. The timer is now cleared in the finally block to prevent the resource leak you identified.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a critical issue where blocking cloud sync calls could crash the MCP process. The introduction of a non-blocking, fire-and-forget nonBlockingCloudSync method with a timeout and single-flight protection is a robust solution. The changes in handleIndexCodebase and handleSearchCode correctly implement this new mechanism. I have one suggestion to further improve the robustness of the new sync method by wrapping its logic in an async IIFE to prevent potential unhandled promise rejections.

Comment on lines +35 to +56
if (this.cloudSyncInFlight) {
return
}

this.cloudSyncInFlight = true

let timeoutId: NodeJS.Timeout
const timeout = new Promise<void>((_, reject) => {
timeoutId = setTimeout(
() => reject(new Error('Cloud sync timed out')),
ToolHandlers.CLOUD_SYNC_TIMEOUT_MS,
)
})

Promise.race([this.syncIndexedCodebasesFromCloud(), timeout])
.catch((err) => {
console.warn(`[SYNC-CLOUD] Non-blocking cloud sync failed: ${err.message || err}`)
})
.finally(() => {
clearTimeout(timeoutId)
this.cloudSyncInFlight = false
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation creates a "floating promise" by not returning or awaiting the result of Promise.race(...). While this achieves the fire-and-forget behavior, it can lead to unhandled promise rejections if an error were to occur in the .finally() block. Additionally, the way timeoutId is declared can be fragile and may fail stricter TypeScript compiler checks.

A more robust pattern is to wrap the logic in an async IIFE (Immediately Invoked Function Expression). This ensures the entire promise chain is properly handled and provides a cleaner scope for the timeout logic, making the code safer and easier to maintain.

    if (this.cloudSyncInFlight) {
      return
    }

    this.cloudSyncInFlight = true;

    (async () => {
      let timeoutId: NodeJS.Timeout | undefined
      try {
        const timeoutPromise = new Promise<never>((_, reject) => {
          timeoutId = setTimeout(
            () => reject(new Error('Cloud sync timed out')),
            ToolHandlers.CLOUD_SYNC_TIMEOUT_MS,
          )
        })

        await Promise.race([this.syncIndexedCodebasesFromCloud(), timeoutPromise])
      }
      catch (err: any) {
        console.warn(`[SYNC-CLOUD] Non-blocking cloud sync failed: ${err.message || err}`)
      }
      finally {
        if (timeoutId) {
          clearTimeout(timeoutId)
        }
        this.cloudSyncInFlight = false
      }
    })()

syncIndexedCodebasesFromCloud() was called with await in both
handleSearchCode and handleIndexCodebase. This method iterates all
Milvus collections and calls ensureLoaded() on each — an expensive
operation with no timeout. On remote/shared Milvus instances, hung
ensureLoaded() calls block the handler indefinitely, causing the
stdio transport to drop and the MCP process to crash with
"socket hang up".

Replace the blocking await calls with a non-blocking fire-and-forget
wrapper that:
- Never blocks the calling handler
- Applies a 10-second timeout guard
- Uses single-flight protection to prevent overlapping syncs

The startup sync via SyncManager.startBackgroundSync() continues to
handle the initial cloud-to-local reconciliation.

Fixes pleaseai#71

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@amondnet amondnet force-pushed the fix/search-cloud-sync-crash branch from f193d78 to a37e167 Compare February 3, 2026 03:08
@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 3, 2026

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates MCP tool handlers to avoid process crashes caused by blocking/hanging cloud sync calls against remote Milvus/Zilliz instances.

Changes:

  • Added a fire-and-forget nonBlockingCloudSync() wrapper with a timeout guard and “single-flight” flag.
  • Replaced blocking await this.syncIndexedCodebasesFromCloud() calls in index_codebase and search_code with the non-blocking wrapper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +34 to +39
private nonBlockingCloudSync(): void {
if (this.cloudSyncInFlight) {
return
}

this.cloudSyncInFlight = true
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are integration tests for ToolHandlers; it would be good to add coverage for this new behavior (e.g., handler returns even if cloud sync never resolves, timeout logs and doesn’t crash, and repeated calls don’t start overlapping syncs).

Copilot uses AI. Check for mistakes.
console.warn(`[SYNC-CLOUD] Non-blocking cloud sync failed: ${err.message || err}`)
})
.finally(() => {
clearTimeout(timeoutId)
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In strict TS mode, timeoutId is not definitely assigned when used in clearTimeout(timeoutId), which will cause a compile error (TS2454). Consider initializing it as ReturnType<typeof setTimeout> | undefined (and guarding clearTimeout), or restructure so the timeout handle is created before it’s referenced in finally.

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +56
.finally(() => {
clearTimeout(timeoutId)
this.cloudSyncInFlight = false
})
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloudSyncInFlight is reset to false in finally when the timeout wins the Promise.race, but the underlying syncIndexedCodebasesFromCloud() promise will still be running/hung. Subsequent calls can then start additional syncs, defeating the “single-flight” protection and potentially piling up hung operations. Consider tying cloudSyncInFlight to the actual sync promise lifecycle (or add a cooldown/circuit-breaker) rather than the race result.

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +32
* Fire-and-forget cloud sync with a timeout guard.
*
* - Never blocks the calling handler.
* - Prevents overlapping syncs (single-flight).
* - Aborts after CLOUD_SYNC_TIMEOUT_MS so a hung ensureLoaded() call on a
* remote Milvus instance cannot crash the MCP process.
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says the sync “Aborts after CLOUD_SYNC_TIMEOUT_MS”, but Promise.race doesn’t cancel the underlying syncIndexedCodebasesFromCloud() work—it only stops waiting for it. Please reword to avoid implying cancellation, or implement actual cancellation if the vector DB client supports it.

Suggested change
* Fire-and-forget cloud sync with a timeout guard.
*
* - Never blocks the calling handler.
* - Prevents overlapping syncs (single-flight).
* - Aborts after CLOUD_SYNC_TIMEOUT_MS so a hung ensureLoaded() call on a
* remote Milvus instance cannot crash the MCP process.
* Fire-and-forget cloud sync with a timeout guard on waiting.
*
* - Never blocks the calling handler.
* - Prevents overlapping syncs (single-flight).
* - Stops waiting after CLOUD_SYNC_TIMEOUT_MS so a hung ensureLoaded() call on a
* remote Milvus instance cannot block or crash the MCP process.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files. type: bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

search_code and index_codebase crash MCP process due to blocking syncIndexedCodebasesFromCloud

1 participant