fix(amazonq): properly handle encode server exception #5585

leigaol · 2025-04-16T04:52:43Z

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Description

The encoder server is a single threaded server that may get unresponsive during heavy compute. At the beginning of indexing, the encoder server first parses all js,ts,py,java files using tree sitter, then it proceeds to build vector index.

I have observed that, if a repo has >25k large java files, for example, https://github.com/elastic/elasticsearch (1.4GB), tree sitter parsing can take 6 min 20 seconds. I tried open the elastic search repo and index it for 20 times, among these 20 times, I saw this java.net.ConnectException: Connection refusedshowing up about 10 times. This indicates that the client failed to do a TCP handshake with the server, because the server is busy parsing the files, the node js event loop was not freed to handle TCP handshake but fully focused on parsing. This also explains why we sometimes see project context java.net.SocketTimeoutException: Read timed out.message

The long running index http call has an about 50% chance to fail with connection refused status in the test case(25k large java files), when connection refused, the LSP process also quits. The chance of running into this issue is significantly small for smaller size repos because parsing is done very quickly or repos that is not js,ts,py,java because we do not parse other languages (that is why this bug is not reproducible in the aws toolkit jetbrains kotlin repo).

When there is such connection refused error message, we were doing retry of indexing, which breaks the existing indexing and performs repetitive indexing, which combines the log looping issue fixed in 3357e88 contributed to IDE performance issue.

The vector index process already had some "break out of current event loop" design that pauses the indexing, watch for OS CPU/memory usage, I have yet seen a java.net.ConnectException: Connection refused error while vector indexing.

WARN - software.aws.toolkits.jetbrains.services.amazonq.project.ProjectContextProvider - failed to init project context

java.net.ConnectException: Connection refused
    at java.base/sun.nio.ch.Net.pollConnect(Native Method)
    at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:682)
    at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:542)
    at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:592)
    at java.base/java.net.Socket.connect(Socket.java:751)
    at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:178)
    at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:531)
    at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:636)
    at java.base/sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:948)
    at java.base/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:759)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1705)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1614)
    at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:531)
    at software.aws.toolkits.jetbrains.services.amazonq.project.ProjectContextProvider$sendMsgToLsp$4.invokeSuspend(ProjectContextProvider.kt:341)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
    at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:111)
    at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:99)
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:608)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:873)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:763)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:750)

Checklist

My code follows the code style of this project
I have added tests to cover my changes
A short description of the change has been added to the CHANGELOG if the change is customer-facing in the IDE.
I have added metrics for my changes (if required)

License

I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copilot

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

plugins/amazonq/shared/jetbrains-community/src/software/aws/toolkits/jetbrains/services/amazonq/project/ProjectContextProvider.kt:127

[nitpick] Consider logging a debug-level message before returning in this branch to clearly indicate that a 'Connection refused' error is expected to be temporarily ignored because the encoder server is busy with Tree-sitter parsing.

if (encoderServer.isNodeProcessRunning()) {

leigaol · 2025-04-16T20:31:47Z

...unity/src/software/aws/toolkits/jetbrains/services/amazonq/project/ProjectContextProvider.kt

+                    if (encoderServer.isNodeProcessRunning()) {
+                        return
+                    } else {
+                        logger.warn(e) { "project context process quit unexpectedly" }


when the process is died, restarting this function does not help.

process quit unexpectly issue will be fixed in newer WS LSP releases, this PR is no longer needed

properly handle connection refused

8e1a5de

leigaol requested a review from a team as a code owner April 16, 2025 04:52

leigaol requested a review from Copilot April 16, 2025 05:11

Copilot AI reviewed Apr 16, 2025

View reviewed changes

leigaol marked this pull request as draft April 16, 2025 05:19

warn log

bb36408

leigaol changed the title ~~fix(amazonq): properly handle connection refused from encoder server~~ fix(amazonq): properly handle encode server exception Apr 16, 2025

rli approved these changes Apr 16, 2025

View reviewed changes

leigaol closed this Apr 16, 2025

leigaol commented Apr 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(amazonq): properly handle encode server exception #5585

fix(amazonq): properly handle encode server exception #5585

Uh oh!

leigaol commented Apr 16, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

leigaol Apr 16, 2025

Uh oh!

leigaol Apr 16, 2025

Uh oh!

Uh oh!

fix(amazonq): properly handle encode server exception #5585

fix(amazonq): properly handle encode server exception #5585

Uh oh!

Conversation

leigaol commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Types of changes

Description

Checklist

License

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

leigaol Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

leigaol Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

leigaol commented Apr 16, 2025 •

edited

Loading