fix: prevent search_code and index_codebase from crashing MCP process#72
fix: prevent search_code and index_codebase from crashing MCP process#72ProxiBlue wants to merge 1 commit intopleaseai:mainfrom
Conversation
Summary of ChangesHello @ProxiBlue, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the stability and responsiveness of the MCP process by addressing a critical issue where interactions with remote vector databases could lead to application crashes. By refactoring the cloud synchronization logic to be non-blocking, incorporating a timeout mechanism, and ensuring single-flight execution, the system can now handle potentially slow or unresponsive external services without impacting core functionality or user experience. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly identifies and resolves a critical issue where a hanging network call could crash the main process. The solution to use a fire-and-forget, single-flight, and timeout-guarded function is a robust approach. The implementation is clean and effectively decouples the long-running sync operation from the request handlers. My review includes one suggestion to prevent a minor resource leak from dangling timers. Overall, this is a solid fix for a potentially serious stability problem.
| this.cloudSyncInFlight = true | ||
|
|
||
| const timeout = new Promise<void>((_, reject) => | ||
| setTimeout( | ||
| () => reject(new Error('Cloud sync timed out')), | ||
| ToolHandlers.CLOUD_SYNC_TIMEOUT_MS, | ||
| ), | ||
| ) | ||
|
|
||
| Promise.race([this.syncIndexedCodebasesFromCloud(), timeout]) | ||
| .catch((err) => { | ||
| console.warn(`[SYNC-CLOUD] Non-blocking cloud sync failed: ${err.message || err}`) | ||
| }) | ||
| .finally(() => { | ||
| this.cloudSyncInFlight = false | ||
| }) |
There was a problem hiding this comment.
There's a minor resource leak here. If syncIndexedCodebasesFromCloud() completes or fails before the timeout, the timer created by setTimeout is not cleared and will continue to run until it elapses. In a long-running process, this could lead to an accumulation of dangling timers. It's good practice to clear the timer in the finally block to prevent this.
this.cloudSyncInFlight = true;
let timeoutId: NodeJS.Timeout;
const timeout = new Promise<void>((_, reject) => {
timeoutId = setTimeout(
() => reject(new Error('Cloud sync timed out')),
ToolHandlers.CLOUD_SYNC_TIMEOUT_MS,
);
});
Promise.race([this.syncIndexedCodebasesFromCloud(), timeout])
.catch((err) => {
console.warn(`[SYNC-CLOUD] Non-blocking cloud sync failed: ${err.message || err}`);
})
.finally(() => {
clearTimeout(timeoutId);
this.cloudSyncInFlight = false;
});871a27a to
f193d78
Compare
|
@gemini-code-assist please re-review — the |
There was a problem hiding this comment.
Code Review
This pull request effectively addresses a critical issue where blocking cloud sync calls could crash the MCP process. The introduction of a non-blocking, fire-and-forget nonBlockingCloudSync method with a timeout and single-flight protection is a robust solution. The changes in handleIndexCodebase and handleSearchCode correctly implement this new mechanism. I have one suggestion to further improve the robustness of the new sync method by wrapping its logic in an async IIFE to prevent potential unhandled promise rejections.
| if (this.cloudSyncInFlight) { | ||
| return | ||
| } | ||
|
|
||
| this.cloudSyncInFlight = true | ||
|
|
||
| let timeoutId: NodeJS.Timeout | ||
| const timeout = new Promise<void>((_, reject) => { | ||
| timeoutId = setTimeout( | ||
| () => reject(new Error('Cloud sync timed out')), | ||
| ToolHandlers.CLOUD_SYNC_TIMEOUT_MS, | ||
| ) | ||
| }) | ||
|
|
||
| Promise.race([this.syncIndexedCodebasesFromCloud(), timeout]) | ||
| .catch((err) => { | ||
| console.warn(`[SYNC-CLOUD] Non-blocking cloud sync failed: ${err.message || err}`) | ||
| }) | ||
| .finally(() => { | ||
| clearTimeout(timeoutId) | ||
| this.cloudSyncInFlight = false | ||
| }) |
There was a problem hiding this comment.
The current implementation creates a "floating promise" by not returning or awaiting the result of Promise.race(...). While this achieves the fire-and-forget behavior, it can lead to unhandled promise rejections if an error were to occur in the .finally() block. Additionally, the way timeoutId is declared can be fragile and may fail stricter TypeScript compiler checks.
A more robust pattern is to wrap the logic in an async IIFE (Immediately Invoked Function Expression). This ensures the entire promise chain is properly handled and provides a cleaner scope for the timeout logic, making the code safer and easier to maintain.
if (this.cloudSyncInFlight) {
return
}
this.cloudSyncInFlight = true;
(async () => {
let timeoutId: NodeJS.Timeout | undefined
try {
const timeoutPromise = new Promise<never>((_, reject) => {
timeoutId = setTimeout(
() => reject(new Error('Cloud sync timed out')),
ToolHandlers.CLOUD_SYNC_TIMEOUT_MS,
)
})
await Promise.race([this.syncIndexedCodebasesFromCloud(), timeoutPromise])
}
catch (err: any) {
console.warn(`[SYNC-CLOUD] Non-blocking cloud sync failed: ${err.message || err}`)
}
finally {
if (timeoutId) {
clearTimeout(timeoutId)
}
this.cloudSyncInFlight = false
}
})()syncIndexedCodebasesFromCloud() was called with await in both handleSearchCode and handleIndexCodebase. This method iterates all Milvus collections and calls ensureLoaded() on each — an expensive operation with no timeout. On remote/shared Milvus instances, hung ensureLoaded() calls block the handler indefinitely, causing the stdio transport to drop and the MCP process to crash with "socket hang up". Replace the blocking await calls with a non-blocking fire-and-forget wrapper that: - Never blocks the calling handler - Applies a 10-second timeout guard - Uses single-flight protection to prevent overlapping syncs The startup sync via SyncManager.startBackgroundSync() continues to handle the initial cloud-to-local reconciliation. Fixes pleaseai#71 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
f193d78 to
a37e167
Compare
|
There was a problem hiding this comment.
Pull request overview
Updates MCP tool handlers to avoid process crashes caused by blocking/hanging cloud sync calls against remote Milvus/Zilliz instances.
Changes:
- Added a fire-and-forget
nonBlockingCloudSync()wrapper with a timeout guard and “single-flight” flag. - Replaced blocking
await this.syncIndexedCodebasesFromCloud()calls inindex_codebaseandsearch_codewith the non-blocking wrapper.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| private nonBlockingCloudSync(): void { | ||
| if (this.cloudSyncInFlight) { | ||
| return | ||
| } | ||
|
|
||
| this.cloudSyncInFlight = true |
There was a problem hiding this comment.
There are integration tests for ToolHandlers; it would be good to add coverage for this new behavior (e.g., handler returns even if cloud sync never resolves, timeout logs and doesn’t crash, and repeated calls don’t start overlapping syncs).
| console.warn(`[SYNC-CLOUD] Non-blocking cloud sync failed: ${err.message || err}`) | ||
| }) | ||
| .finally(() => { | ||
| clearTimeout(timeoutId) |
There was a problem hiding this comment.
In strict TS mode, timeoutId is not definitely assigned when used in clearTimeout(timeoutId), which will cause a compile error (TS2454). Consider initializing it as ReturnType<typeof setTimeout> | undefined (and guarding clearTimeout), or restructure so the timeout handle is created before it’s referenced in finally.
| .finally(() => { | ||
| clearTimeout(timeoutId) | ||
| this.cloudSyncInFlight = false | ||
| }) |
There was a problem hiding this comment.
cloudSyncInFlight is reset to false in finally when the timeout wins the Promise.race, but the underlying syncIndexedCodebasesFromCloud() promise will still be running/hung. Subsequent calls can then start additional syncs, defeating the “single-flight” protection and potentially piling up hung operations. Consider tying cloudSyncInFlight to the actual sync promise lifecycle (or add a cooldown/circuit-breaker) rather than the race result.
| * Fire-and-forget cloud sync with a timeout guard. | ||
| * | ||
| * - Never blocks the calling handler. | ||
| * - Prevents overlapping syncs (single-flight). | ||
| * - Aborts after CLOUD_SYNC_TIMEOUT_MS so a hung ensureLoaded() call on a | ||
| * remote Milvus instance cannot crash the MCP process. |
There was a problem hiding this comment.
This comment says the sync “Aborts after CLOUD_SYNC_TIMEOUT_MS”, but Promise.race doesn’t cancel the underlying syncIndexedCodebasesFromCloud() work—it only stops waiting for it. Please reword to avoid implying cancellation, or implement actual cancellation if the vector DB client supports it.
| * Fire-and-forget cloud sync with a timeout guard. | |
| * | |
| * - Never blocks the calling handler. | |
| * - Prevents overlapping syncs (single-flight). | |
| * - Aborts after CLOUD_SYNC_TIMEOUT_MS so a hung ensureLoaded() call on a | |
| * remote Milvus instance cannot crash the MCP process. | |
| * Fire-and-forget cloud sync with a timeout guard on waiting. | |
| * | |
| * - Never blocks the calling handler. | |
| * - Prevents overlapping syncs (single-flight). | |
| * - Stops waiting after CLOUD_SYNC_TIMEOUT_MS so a hung ensureLoaded() call on a | |
| * remote Milvus instance cannot block or crash the MCP process. |



Summary
await this.syncIndexedCodebasesFromCloud()inhandleSearchCodeandhandleIndexCodebasewith a non-blocking fire-and-forget wrapperensureLoaded()call on a remote Milvus instance cannot kill the MCP processFixes #71
Problem
search_codeandindex_codebasebothawaitsyncIndexedCodebasesFromCloud()before processing the request. This method iterates all collections in the vector database and callsensureLoaded()on each one. On a remote or shared Milvus/Zilliz instance,ensureLoaded()can hang indefinitely (the gRPC client has no default timeout), which blocks the handler, causes the stdio transport to drop, and crashes the MCP process withsocket hang up.get_indexing_statusandclear_indexdo not call this method and work correctly — confirming the sync call is the crash point.This is also redundant work —
SyncManager.startBackgroundSync()already performs cloud sync on MCP server startup.Changes
packages/mcp/src/handlers.tsAdded
nonBlockingCloudSync()method:Promise.racewith a 10-second timeout — if sync doesn't complete in time, it's abandoned gracefullycloudSyncInFlight) prevents overlapping syncsReplaced both
await this.syncIndexedCodebasesFromCloud()calls withthis.nonBlockingCloudSync()Test plan
MILVUS_ADDRESS=x.x.x.x:19530)search_code— should return results or "not indexed" message without crashingindex_codebase— should start indexing without crashingSyncManagerstill runs on process startsearch_codecalls don't trigger overlapping syncs🤖 Generated with Claude Code