|
| 1 | +# 14. Use AWS Cloud Map for Service Discovery |
| 2 | + |
| 3 | +Date: 2025-07-21 |
| 4 | + |
| 5 | +## Status |
| 6 | + |
| 7 | +Accepted |
| 8 | + |
| 9 | +## Context |
| 10 | + |
| 11 | +The introduction of the reporting service transforms the Mavis application into one consisting of multiple ECS |
| 12 | +services that need to communicate with each other internally, for data access and processing. Therefore, we require a |
| 13 | +scalable and reliable service discovery mechanism to facilitate this architectural change. |
| 14 | + |
| 15 | +## Considered Options |
| 16 | + |
| 17 | +Several service discovery approaches were evaluated for ECS inter-service communication, prioritizing CodeDeploy |
| 18 | +compatibility for blue-green deployments and scalability for future services. |
| 19 | + |
| 20 | +### Option 1: Internal Application Load Balancer (ALB) |
| 21 | + |
| 22 | +Use an internal ALB for routing via path/host rules, with ECS-integrated target groups for dynamic registration. |
| 23 | + |
| 24 | +- **Pros**: Includes load balancing and health checks. |
| 25 | +- **Cons**: Incompatible with CodeDeploy's task set management (max one target group per ECS service); |
| 26 | + adds SSL and rule overhead. |
| 27 | + |
| 28 | +Rejected due to deployment issues. |
| 29 | + |
| 30 | +### Option 2: AWS Service Connect |
| 31 | + |
| 32 | +Managed ECS discovery with DNS, load balancing, and metrics, built on Cloud Map. |
| 33 | + |
| 34 | +- **Pros**: Easy setup with failover and telemetry; implementing TLS/SSL is straightforward. |
| 35 | +- **Cons**: Requires ECS controller, conflicting with CodeDeploy's blue-green needs. |
| 36 | + |
| 37 | +Rejected for compatibility. |
| 38 | + |
| 39 | +### Option 3: AWS Cloud Map (Service Discovery) |
| 40 | + |
| 41 | +Register services in a private DNS namespace for resolution (e.g., `web.mavis.${environment}.aws-int`), using MULTIVALUE |
| 42 | +routing. |
| 43 | + |
| 44 | +- **Pros**: CodeDeploy-compatible; lightweight DNS-based; ECS-integrated registration. |
| 45 | +- **Cons**: No built-in load balancing; needs manual security rules; implementing TLS/SSL requires additional complexity |
| 46 | + |
| 47 | +Selected for meeting requirements. |
| 48 | + |
| 49 | +### Comparison |
| 50 | + |
| 51 | +With the requirement of blue-green deployments, AWS Cloud Map was the only viable option that offered a simple DNS-based |
| 52 | +service discovery mechanism that integrates well with ECS and CodeDeploy. |
| 53 | + |
| 54 | +## Decision |
| 55 | + |
| 56 | +We will use AWS Cloud Map (Service Discovery) to enable service-to-service communication. This involves creating a |
| 57 | +private DNS namespace within the VPC and registering ECS services (e.g., the web service) with Cloud Map. Services can |
| 58 | +then resolve each other using DNS names (e.g., `web.mavis.${environment}.aws-int`), allowing dynamic IP resolution for |
| 59 | +tasks. |
| 60 | + |
| 61 | +- A private DNS namespace (`mavis.${environment}.aws-int`) will be provisioned. |
| 62 | +- The web service will be registered with a MULTIVALUE routing policy to support multiple tasks. |
| 63 | +- Security group rules will explicitly allow ingress/egress between services |
| 64 | + (e.g., reporting service to web service on port 4000). |
| 65 | +- This integrates seamlessly with Terraform for infrastructure management and does not conflict with CodeDeploy. |
| 66 | + |
| 67 | +## Consequences |
| 68 | + |
| 69 | +- Services will dynamically discover each other via DNS, improving scalability and reducing configuration drift. |
| 70 | +- Additional Terraform resources (e.g., `aws_service_discovery_private_dns_namespace` and |
| 71 | + `aws_service_discovery_service`) will be maintained, increasing infrastructure complexity slightly but providing |
| 72 | + better automation. |
| 73 | +- DNS caching (TTL set to 10 seconds initially) may introduce minor latency during task scaling or failures; this can be |
| 74 | + tuned based on monitoring. |
| 75 | +- Alignment with AWS-native services ensures compatibility with future enhancements but requires monitoring DNS |
| 76 | + resolution metrics to detect issues. |
| 77 | +- No changes to application code are needed beyond using the resolved DNS names for internal calls. |
0 commit comments