Skip to content

Commit 97bd0bd

Browse files
Merge pull request #4015 from nhsuk/adr_for_service_discovery
Create ADR detailing choice of framework for B2B communication
2 parents 6774df9 + 8cc89b0 commit 97bd0bd

File tree

1 file changed

+77
-0
lines changed

1 file changed

+77
-0
lines changed
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# 14. Use AWS Cloud Map for Service Discovery
2+
3+
Date: 2025-07-21
4+
5+
## Status
6+
7+
Accepted
8+
9+
## Context
10+
11+
The introduction of the reporting service transforms the Mavis application into one consisting of multiple ECS
12+
services that need to communicate with each other internally, for data access and processing. Therefore, we require a
13+
scalable and reliable service discovery mechanism to facilitate this architectural change.
14+
15+
## Considered Options
16+
17+
Several service discovery approaches were evaluated for ECS inter-service communication, prioritizing CodeDeploy
18+
compatibility for blue-green deployments and scalability for future services.
19+
20+
### Option 1: Internal Application Load Balancer (ALB)
21+
22+
Use an internal ALB for routing via path/host rules, with ECS-integrated target groups for dynamic registration.
23+
24+
- **Pros**: Includes load balancing and health checks.
25+
- **Cons**: Incompatible with CodeDeploy's task set management (max one target group per ECS service);
26+
adds SSL and rule overhead.
27+
28+
Rejected due to deployment issues.
29+
30+
### Option 2: AWS Service Connect
31+
32+
Managed ECS discovery with DNS, load balancing, and metrics, built on Cloud Map.
33+
34+
- **Pros**: Easy setup with failover and telemetry; implementing TLS/SSL is straightforward.
35+
- **Cons**: Requires ECS controller, conflicting with CodeDeploy's blue-green needs.
36+
37+
Rejected for compatibility.
38+
39+
### Option 3: AWS Cloud Map (Service Discovery)
40+
41+
Register services in a private DNS namespace for resolution (e.g., `web.mavis.${environment}.aws-int`), using MULTIVALUE
42+
routing.
43+
44+
- **Pros**: CodeDeploy-compatible; lightweight DNS-based; ECS-integrated registration.
45+
- **Cons**: No built-in load balancing; needs manual security rules; implementing TLS/SSL requires additional complexity
46+
47+
Selected for meeting requirements.
48+
49+
### Comparison
50+
51+
With the requirement of blue-green deployments, AWS Cloud Map was the only viable option that offered a simple DNS-based
52+
service discovery mechanism that integrates well with ECS and CodeDeploy.
53+
54+
## Decision
55+
56+
We will use AWS Cloud Map (Service Discovery) to enable service-to-service communication. This involves creating a
57+
private DNS namespace within the VPC and registering ECS services (e.g., the web service) with Cloud Map. Services can
58+
then resolve each other using DNS names (e.g., `web.mavis.${environment}.aws-int`), allowing dynamic IP resolution for
59+
tasks.
60+
61+
- A private DNS namespace (`mavis.${environment}.aws-int`) will be provisioned.
62+
- The web service will be registered with a MULTIVALUE routing policy to support multiple tasks.
63+
- Security group rules will explicitly allow ingress/egress between services
64+
(e.g., reporting service to web service on port 4000).
65+
- This integrates seamlessly with Terraform for infrastructure management and does not conflict with CodeDeploy.
66+
67+
## Consequences
68+
69+
- Services will dynamically discover each other via DNS, improving scalability and reducing configuration drift.
70+
- Additional Terraform resources (e.g., `aws_service_discovery_private_dns_namespace` and
71+
`aws_service_discovery_service`) will be maintained, increasing infrastructure complexity slightly but providing
72+
better automation.
73+
- DNS caching (TTL set to 10 seconds initially) may introduce minor latency during task scaling or failures; this can be
74+
tuned based on monitoring.
75+
- Alignment with AWS-native services ensures compatibility with future enhancements but requires monitoring DNS
76+
resolution metrics to detect issues.
77+
- No changes to application code are needed beyond using the resolved DNS names for internal calls.

0 commit comments

Comments
 (0)