fix(amazonq): properly handle encode server exception #5585
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Types of changes
Description
The encoder server is a single threaded server that may get unresponsive during heavy compute. At the beginning of indexing, the encoder server first parses all js,ts,py,java files using tree sitter, then it proceeds to build vector index.
I have observed that, if a repo has >25k large java files, for example, https://github.com/elastic/elasticsearch (1.4GB), tree sitter parsing can take 6 min 20 seconds. I tried open the elastic search repo and index it for 20 times, among these 20 times, I saw this
java.net.ConnectException: Connection refused
showing up about 10 times. This indicates that the client failed to do a TCP handshake with the server, because the server is busy parsing the files, the node js event loop was not freed to handle TCP handshake but fully focused on parsing. This also explains why we sometimes seeproject context java.net.SocketTimeoutException: Read timed out.message
The long running index http call has an about 50% chance to fail with connection refused status in the test case(25k large java files), when connection refused, the LSP process also quits. The chance of running into this issue is significantly small for smaller size repos because parsing is done very quickly or repos that is not js,ts,py,java because we do not parse other languages (that is why this bug is not reproducible in the aws toolkit jetbrains kotlin repo).
When there is such connection refused error message, we were doing retry of indexing, which breaks the existing indexing and performs repetitive indexing, which combines the log looping issue fixed in 3357e88 contributed to IDE performance issue.
The vector index process already had some "break out of current event loop" design that pauses the indexing, watch for OS CPU/memory usage, I have yet seen a
java.net.ConnectException: Connection refused
error while vector indexing.Checklist
License
I confirm that my contribution is made under the terms of the Apache 2.0 license.