Skip to content

feat: Socket simplification#684

Merged
zeljkoX merged 4 commits intomainfrom
socket-simplification
Mar 4, 2026
Merged

feat: Socket simplification#684
zeljkoX merged 4 commits intomainfrom
socket-simplification

Conversation

@zeljkoX
Copy link
Collaborator

@zeljkoX zeljkoX commented Mar 3, 2026

Summary

This PR will remove socket connection pool logic in favor of simpler model in order to avoid issues with connection reuse and timeouts.

Testing Process

Checklist

  • Add a reference to related issues in the PR description.
  • Add unit tests if applicable.

Note

If you are using Relayer in your stack, consider adding your team or organization to our list of Relayer Users in the Wild!

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved socket connection reliability with automatic reconnection when connections are lost during execution.
    • Enhanced error handling for unavailable plugin services with clearer error messages.
  • Documentation

    • Removed configuration for socket-level timeout settings; timeout behavior now handled through simplified internal logic.
  • Chores

    • Refactored plugin socket connection management for improved stability and maintainability.

@zeljkoX zeljkoX requested a review from a team as a code owner March 3, 2026 13:15
@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

This PR removes the socket pooling infrastructure from the plugin execution system, replacing it with a simpler per-execution socket model. Changes span documentation, TypeScript executor implementation, Rust configuration and constants, shared socket service logic, and corresponding tests.

Changes

Cohort / File(s) Summary
Documentation Updates
docs/plugins/index.mdx, plugins/ARCHITECTURE.md
Removed references to PLUGIN_SOCKET_IDLE_TIMEOUT_SECS and PLUGIN_SOCKET_READ_TIMEOUT_SECS environment variables from Timeout Alignment and Fine-Tuning sections.
Core Socket Pool Executor Refactor
plugins/lib/pool-executor.ts
Removed SocketPool, PooledSocket, AcquiredSocket, and SocketClosedError classes; replaced global socket pooling with per-execution sockets. Introduced connectWithRetry() and attemptConnect() methods. Simplified ensureConnected() signature and refactored send()/sendWithRetry() flow. Updated error sanitization to map socket unavailability errors to SERVICE_UNAVAILABLE (503).
Test Updates
plugins/tests/lib/direct-executor.test.ts, plugins/tests/lib/pool-executor-reconnect.test.ts
Removed extensive SocketPool/pooling-related tests and socket age-based eviction tests. Added new regression test validating mid-execution socket reconnection behavior. Updated remaining tests to reflect per-execution socket model without creation timestamp tracking.
Rust Constants & Configuration
src/constants/plugins.rs, src/services/plugins/config.rs
Deleted DEFAULT_SOCKET_IDLE_TIMEOUT_SECS and DEFAULT_SOCKET_READ_TIMEOUT_SECS constants. Removed socket_idle_timeout_secs and socket_read_timeout_secs fields from PluginConfig struct and eliminated related environment variable parsing logic.
Shared Socket Service
src/services/plugins/shared_socket.rs
Removed per-connection idle_timeout and read_timeout fields from SharedSocketService. Replaced per-connection timeouts with single safety timeout derived from DEFAULT_PLUGIN_TIMEOUT_SECONDS + 60. Deleted handle_connection_with_timeout() wrapper and updated handle_connection() signature to remove timeout parameters.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

cla: allowlist

Suggested reviewers

  • tirumerla
  • LuisUrrutia

Poem

🐰 The pools we dug are filled in now,
Fresh sockets spring where old ones plowed,
No more reuse, just pure and clear,
Each execution gets its own frontier! ✨
Connection flows like morning dew,
Simpler pathways, tried and true.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The description follows the template structure with Summary and Testing Process sections, confirms unit tests were added, but leaves the issue reference checklist item unchecked.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title 'Socket simplification' accurately captures the primary change of removing socket pooling and connection reuse infrastructure in favor of a simpler model.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch socket-simplification

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

@zeljkoX zeljkoX changed the title Socket simplification feat: Socket simplification Mar 3, 2026
@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

❌ Patch coverage is 97.21448% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.94%. Comparing base (99027e5) to head (a03d29a).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/services/plugins/shared_socket.rs 97.21% 10 Missing ⚠️
Additional details and impacted files
Flag Coverage Δ
ai 0.26% <0.00%> (-0.01%) ⬇️
dev 90.93% <97.21%> (+0.02%) ⬆️
properties 0.01% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@            Coverage Diff             @@
##             main     #684      +/-   ##
==========================================
+ Coverage   90.91%   90.94%   +0.02%     
==========================================
  Files         288      288              
  Lines      117665   118002     +337     
==========================================
+ Hits       106981   107315     +334     
- Misses      10684    10687       +3     
Files with missing lines Coverage Δ
src/services/plugins/config.rs 94.37% <ø> (-0.15%) ⬇️
src/services/plugins/shared_socket.rs 95.87% <97.21%> (+0.63%) ⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/services/plugins/shared_socket.rs (1)

1274-1311: ⚠️ Potential issue | 🟠 Major

test_idle_connection_cleaned_up_by_safety_timeout does not validate cleanup.

This test never asserts timeout-triggered cleanup/closure; it only sleeps briefly and drops the client. It can pass even if safety-timeout logic regresses.

💡 Suggested direction for a deterministic assertion
-    #[tokio::test]
+    #[tokio::test(start_paused = true)]
     async fn test_idle_connection_cleaned_up_by_safety_timeout() {
       ...
-      tokio::time::sleep(Duration::from_millis(200)).await;
-      drop(client);
+      tokio::time::advance(Duration::from_secs(
+          crate::constants::DEFAULT_PLUGIN_TIMEOUT_SECONDS + 61
+      ))
+      .await;
+
+      let write_result = client
+          .write_all((serde_json::to_string(&PluginMessage::Shutdown).unwrap() + "\n").as_bytes())
+          .await;
+      assert!(write_result.is_err(), "Connection should be closed by safety timeout");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/plugins/shared_socket.rs` around lines 1274 - 1311, The test
test_idle_connection_cleaned_up_by_safety_timeout currently never asserts that
the safety-timeout cleanup occurs; update it to deterministically verify the
connection is closed by either (A) configuring the SharedSocketService to use a
much shorter safety timeout for the test (override
DEFAULT_PLUGIN_TIMEOUT_SECONDS or provide a test-only ctor/option) then await
past that timeout and assert the socket is closed (e.g., attempt a read/write on
the UnixStream and expect EOF or an error), or (B) expose/ call the internal
cleanup/check method on SharedSocketService from the test to trigger cleanup
immediately and assert the client receives a disconnect; reference
SharedSocketService::new, register_execution, and the safety timeout constant
(DEFAULT_PLUGIN_TIMEOUT_SECONDS) when locating code to modify. Ensure the test
uses tokio::time::timeout to avoid flakiness and asserts the connection closure
rather than just dropping the client.
🧹 Nitpick comments (2)
src/services/plugins/shared_socket.rs (1)

497-501: Doc comment still describes pooled-socket behavior.

Lines 499-501 mention reused/pooled sockets, but this code path now uses the simplified non-pooled model. This will confuse future maintenance and incident debugging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/plugins/shared_socket.rs` around lines 497 - 501, The doc
comment above the plugin connection handler in shared_socket.rs still refers to
reused/pooled sockets; update it to reflect the current simplified non-pooled
model by removing references to pooling/reuse and clarifying that the inactivity
timeout resets on each message but connections are not pooled or reused—mention
that connections are per-client and closed when idle rather than retained for
reuse so future maintainers debugging the connection handler (the plugin
connection handling code) won't be misled.
plugins/tests/lib/pool-executor-reconnect.test.ts (1)

166-168: Strengthen unavailable-server assertions to lock in API contract.

This test should also assert the sanitized error code/status (SERVICE_UNAVAILABLE, 503), not only success=false.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/tests/lib/pool-executor-reconnect.test.ts` around lines 166 - 168,
The test currently only asserts result.success is false; update the assertions
after expect(result.success).toBe(false) to also verify the sanitized error code
and status on result.error by adding assertions such as
expect(result.error?.code).toBe('SERVICE_UNAVAILABLE') and
expect(result.error?.status).toBe(503) (or
expect(result.error?.statusCode).toBe(503) if the error uses statusCode),
ensuring you reference the same result variable used in this test block.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/lib/pool-executor.ts`:
- Around line 231-239: The race happens because
handleSocketError/handleSocketClose unconditionally clear the shared
connectionPromise while connectWithRetry() may still be running, allowing send()
to start a second connect loop; to fix, make connectionPromise the single source
of truth and only clear or replace it when its specific promise settles — e.g.,
in ensureConnected() set connectionPromise = connectWithRetry(); in
connectWithRetry() ensure it resolves/rejects without external clears, and
update handleSocketError/handleSocketClose to compare the current
connectionPromise to the promise they observed before clearing (or set a
separate reconnecting flag) so they only clear/reset connectionPromise when it
matches the in-flight promise or when that promise has settled; update
references in ensureConnected, connectWithRetry, handleSocketError,
handleSocketClose and send accordingly.

In `@plugins/tests/lib/pool-executor-reconnect.test.ts`:
- Around line 117-128: The current nested setTimeout/setInterval polling around
destroyFirstConnection is flaky; replace the wall-clock polling with a
deterministic trigger tied to the first request/response event: wait for the
first request to be completed (or subscribe to the pool/executor
response/acquire event used in the test) and call destroyFirstConnection()
immediately after that event instead of using the checkInterval/20ms
delays—locate the timers surrounding destroyFirstConnection, remove the
setInterval/setTimeout logic, and invoke destroyFirstConnection from the
first-response callback or resolved promise produced by the test's request
invocation.

In `@src/services/plugins/shared_socket.rs`:
- Around line 540-546: The safety_timeout currently uses
DEFAULT_PLUGIN_TIMEOUT_SECONDS and can close valid long-running requests; change
the calculation to use the configured request timeout
(PLUGIN_POOL_REQUEST_TIMEOUT_SECS) and take the max between that and
DEFAULT_PLUGIN_TIMEOUT_SECONDS (then add the +60 margin) so the guard never
fires earlier than the request budget; update the safety_timeout assignment (and
the analogous logic around lines 548-563) to
Duration::from_secs(std::cmp::max(crate::constants::DEFAULT_PLUGIN_TIMEOUT_SECONDS,
crate::constants::PLUGIN_POOL_REQUEST_TIMEOUT_SECS) + 60) so it respects
configured timeouts.

---

Outside diff comments:
In `@src/services/plugins/shared_socket.rs`:
- Around line 1274-1311: The test
test_idle_connection_cleaned_up_by_safety_timeout currently never asserts that
the safety-timeout cleanup occurs; update it to deterministically verify the
connection is closed by either (A) configuring the SharedSocketService to use a
much shorter safety timeout for the test (override
DEFAULT_PLUGIN_TIMEOUT_SECONDS or provide a test-only ctor/option) then await
past that timeout and assert the socket is closed (e.g., attempt a read/write on
the UnixStream and expect EOF or an error), or (B) expose/ call the internal
cleanup/check method on SharedSocketService from the test to trigger cleanup
immediately and assert the client receives a disconnect; reference
SharedSocketService::new, register_execution, and the safety timeout constant
(DEFAULT_PLUGIN_TIMEOUT_SECONDS) when locating code to modify. Ensure the test
uses tokio::time::timeout to avoid flakiness and asserts the connection closure
rather than just dropping the client.

---

Nitpick comments:
In `@plugins/tests/lib/pool-executor-reconnect.test.ts`:
- Around line 166-168: The test currently only asserts result.success is false;
update the assertions after expect(result.success).toBe(false) to also verify
the sanitized error code and status on result.error by adding assertions such as
expect(result.error?.code).toBe('SERVICE_UNAVAILABLE') and
expect(result.error?.status).toBe(503) (or
expect(result.error?.statusCode).toBe(503) if the error uses statusCode),
ensuring you reference the same result variable used in this test block.

In `@src/services/plugins/shared_socket.rs`:
- Around line 497-501: The doc comment above the plugin connection handler in
shared_socket.rs still refers to reused/pooled sockets; update it to reflect the
current simplified non-pooled model by removing references to pooling/reuse and
clarifying that the inactivity timeout resets on each message but connections
are not pooled or reused—mention that connections are per-client and closed when
idle rather than retained for reuse so future maintainers debugging the
connection handler (the plugin connection handling code) won't be misled.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16dc170 and 3091f6c.

📒 Files selected for processing (8)
  • docs/plugins/index.mdx
  • plugins/ARCHITECTURE.md
  • plugins/lib/pool-executor.ts
  • plugins/tests/lib/direct-executor.test.ts
  • plugins/tests/lib/pool-executor-reconnect.test.ts
  • src/constants/plugins.rs
  • src/services/plugins/config.rs
  • src/services/plugins/shared_socket.rs
💤 Files with no reviewable changes (2)
  • src/constants/plugins.rs
  • plugins/ARCHITECTURE.md

Copy link
Contributor

@NicoMolinaOZ NicoMolinaOZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@zeljkoX zeljkoX merged commit ee494f6 into main Mar 4, 2026
26 checks passed
@zeljkoX zeljkoX deleted the socket-simplification branch March 4, 2026 06:56
@github-actions github-actions bot locked and limited conversation to collaborators Mar 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants