Skip to content

test(websocket): add tests reproducing 1005 reconnect loop failover bug#703

Draft
cl-efornaciari wants to merge 1 commit intomainfrom
feature/ws-1005-reconnect-loop-tests
Draft

test(websocket): add tests reproducing 1005 reconnect loop failover bug#703
cl-efornaciari wants to merge 1 commit intomainfrom
feature/ws-1005-reconnect-loop-tests

Conversation

@cl-efornaciari
Copy link
Contributor

Summary

  • Adds 3 unit tests to test/transports/websocket.test.ts that reproduce the WebSocket 1005 reconnect loop bug where connectionOpenedAt resets on each reconnect, defeating the failover counter
  • Test 1 ("failover counter does not increment during rapid external close loop"): demonstrates the core bug — when a server drops connections with code 1005, the failover counter (streamHandlerInvocationsWithNoConnection) stays at 0 because timeSinceConnectionOpened resets on each reconnect, never exceeding WS_SUBSCRIPTION_UNRESPONSIVE_TTL
  • Test 2 ("failover counter increments for unresponsive-but-open connections"): control test showing the counter correctly increments when the connection stays open but sends no data — proving the failover mechanism works for the designed scenario but not for rapid external closes
  • Test 3 ("EA stalls during rapid reconnect loop and never recovers"): end-to-end test showing the user-visible symptom — prices initially flow, then the server starts dropping with 1005, cached prices expire, and the EA returns 504 indefinitely while trapped reconnecting to the same URL

Root Cause

The connectionOpenedAt timestamp in WebSocketTransport.streamHandler is reset to Date.now() on every successful reconnect (line 440 of websocket.ts). The connectionUnresponsive check uses Math.min(timeSinceLastMessage, timeSinceConnectionOpened), so in a rapid close/reconnect cycle, timeSinceConnectionOpened (~5s per cycle) always dominates and never exceeds WS_SUBSCRIPTION_UNRESPONSIVE_TTL (default 120s). This prevents the failover counter from incrementing, keeping wsSelectUrl locked on the failing primary URL.

This was introduced in framework PR #614 (the URL failover mechanism) and affects all Tiingo adapter versions from 2.9.0 onward. See rca-tiingo-ws-1005.md for the full RCA with version mapping.

Test plan

  • All 3 new tests pass individually
  • Full websocket test suite passes (15/15 tests)
  • Review the test assertions match the described bug behavior

Made with Cursor

Add 3 tests demonstrating how the connectionOpenedAt reset defeats the
failover counter when WebSocket connections are rapidly closed externally
(code 1005), trapping the EA on the same failing URL indefinitely.

- Test 1: failover counter stays at 0 during rapid 1005 close/reconnect
- Test 2: control test showing counter increments for open-but-unresponsive
- Test 3: end-to-end showing EA stalls and stops serving prices
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

NPM Publishing labels 🏷️

🛑 This PR needs labels to indicate how to increase the current package version in the automated workflows. Please add one of the following labels: none, patch, minor, or major.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant