Skip to content

Commit ef4f7d7

Browse files
authored
Merge pull request #6 from luismr/feature/decoupled-websocket-events-publisher
Implement Decoupled WebSocket Events Publisher with Kafka
2 parents 9e9d4e0 + 814900a commit ef4f7d7

22 files changed

+13592
-730
lines changed

README.md

Lines changed: 157 additions & 245 deletions
Large diffs are not rendered by default.
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# ADR 001 - WebSocket Notification Scalability Strategy
2+
3+
## Context
4+
The `flight-tracker-event-server` currently sends flight updates directly to clients over WebSocket. Sessions are kept in memory, which limits horizontal scalability, requires sticky sessions in the load balancer, and makes managing thousands of connections inefficient.
5+
6+
## Decision
7+
Refactor the application so that the `PingEventPublisher` publishes flight events to a Kafka topic. A new dedicated component will consume these events and manage active WebSocket sessions for message delivery.
8+
9+
Additionally, we will implement feature flags to control the WebSocket delivery mechanism, allowing the system to run in different deployment configurations:
10+
- **Monolithic Mode**: All components run in the same service with in-memory WebSocket sessions
11+
- **Decoupled Mode**: WebSocket delivery runs as a separate component consuming from Kafka
12+
13+
This decision enables separation of responsibilities and prepares the system for a future migration to a STOMP-based architecture (e.g., RabbitMQ) if scale demands increase.
14+
15+
## Justification
16+
- **Time-to-market**: quick delivery without frontend changes
17+
- **Decoupling**: separates event generation from delivery logic
18+
- **Reuses existing infrastructure**: Kafka is already in place
19+
- **Low impact**: avoids protocol changes or client modifications for now
20+
- **Deployment Flexibility**: allows gradual transition between deployment models
21+
- **Reduced Complexity**: initial implementation can stay within the same service
22+
23+
## Alternatives Considered
24+
- **STOMP Broker with RabbitMQ**: powerful but requires client refactor and more setup
25+
- **Redis Streams/PubSub**: simple, fast, but with delivery and clustering limitations
26+
- **Optimizing the current implementation**: fast but not future-proof
27+
- **Immediate Service Split**: would require more upfront work and coordination
28+
29+
## Consequences
30+
- The system becomes modular and more scalable
31+
- Kafka enables better backpressure and failover handling
32+
- Prepares for a transition to STOMP when scale justifies it
33+
- A new WebSocket delivery component must be monitored closely
34+
- Feature flags add complexity but provide deployment flexibility
35+
- Allows for gradual migration of WebSocket handling to a separate service
36+
37+
## Links
38+
- [Technical Analysis – WebSocket Scalability](../analysis/technical-analysis-websocket-flight-tracker.md)
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# WebSocket Notification Scalability
2+
3+
## Context
4+
The **Flight Tracker** project provides users with real-time flight information via **WebSocket** notifications. Currently, the application maintains active WebSocket connections and sends updates directly. As the number of users and simultaneous events increases, the current solution is reaching **scalability limits**, primarily due to in-memory session storage and sticky session dependencies on the load balancer.
5+
6+
## Objective
7+
This document aims to analyze architectural alternatives to scale WebSocket notification delivery while maintaining low latency, high reliability, and minimal disruption to the existing system. The goal is to find an evolutionary solution with fast time-to-market and a path toward future scalability.
8+
9+
## Requirements
10+
11+
### Non-functional Requirements
12+
- Scalability to thousands of simultaneous WebSocket connections
13+
- Low delivery latency (< 100ms)
14+
- High availability and fault tolerance
15+
- Reliable message delivery
16+
- Minimal frontend impact (initially)
17+
18+
### Technical Requirements
19+
- Integration with existing Kafka infrastructure
20+
- Support for component decoupling
21+
- Compatibility with WebSocket and STOMP
22+
- Easy monitoring and operations
23+
24+
## Considered Alternatives
25+
26+
### 1. Kafka + ThreadExecutors
27+
Decouple event dispatching via Kafka and parallelize delivery using thread pools. No frontend changes required.
28+
29+
**Pros:** scalable, no changes to the client, uses existing Kafka
30+
**Cons:** requires concurrency implementation and session control
31+
32+
### 2. STOMP Broker (RabbitMQ/ActiveMQ)
33+
Use a STOMP-compatible message broker as an external relay. Frontends subscribe to STOMP topics via WebSocket.
34+
35+
**Pros:** complete decoupling, mature pub/sub model
36+
**Cons:** requires frontend refactoring and broker setup
37+
38+
### 3. Redis Streams/PubSub
39+
Use Redis for message publishing/subscribing or streams. Messages are distributed across WebSocket server instances.
40+
41+
**Pros:** simple, fast, great for low latency
42+
**Cons:** pub/sub doesn't guarantee delivery; streams require additional handling
43+
44+
### 4. Optimizing the Current Architecture
45+
Local improvements with thread pools, async I/O, or sticky session-based load balancing.
46+
47+
**Pros:** low cost, fast implementation
48+
**Cons:** doesn't solve horizontal scalability limitations
49+
50+
## Comparative Analysis
51+
52+
| Solution | Scalability | Latency | Complexity | Reliability | Frontend Impact |
53+
|---------------------------|-------------|---------|------------|-------------|------------------|
54+
| Kafka + Threads | High | Medium | Medium | High | None |
55+
| STOMP Broker | High | Low | High | High | High |
56+
| Redis Streams/PubSub | Medium | Very Low| Medium | Medium/High | None |
57+
| Local Optimization | Low | Low | Low | Low | None |
58+
59+
## Recommended Conclusion
60+
61+
We recommend an **evolutionary approach in two phases**:
62+
63+
### Phase 1: Refactoring with Kafka + Dedicated WebSocket Component
64+
65+
Refactor the `PingEventPublisher` to publish events to a Kafka topic. A new (or existing) component will consume the topic and handle active WebSocket session management and message delivery.
66+
67+
**Benefits:**
68+
- Fast time-to-market
69+
- Immediate scalability using current infrastructure
70+
- No frontend changes
71+
- Improves modularity and observability
72+
73+
### Phase 2: Evolve to STOMP Broker
74+
75+
If scalability needs increase significantly, migrate to a **STOMP-based broker architecture** (RabbitMQ or ActiveMQ), enabling:
76+
- Topic-based subscriptions
77+
- Automatic message distribution by the broker
78+
- Event-driven backend/frontend
79+
80+
This phase requires more effort and frontend changes, so it's reserved for future growth that justifies the investment.
81+
82+
### Why this phased approach?
83+
84+
- **Time-to-market**: quick delivery with low risk
85+
- **Low disruption**: avoids major changes to frontend/backend for now
86+
- **Preparation**: creates a foundation for future pub/sub migration
87+
88+
This shows how **architecture can evolve with minimal impact** while aligning with team capacity and business context.
89+
90+
## References
91+
92+
- [ADR 001 – WebSocket Scalability Strategy](../adrs/adr-001-websocket-scalability.md)
93+
- [ByteWise010, *“Scaling WebSockets with STOMP and RabbitMQ”*](https://medium.com/@bytewise010/scaling-websocket-messaging-with-spring-boot-e9877c80f027)
94+
- Internal Kafka and Redis benchmark experiences

0 commit comments

Comments
 (0)