Description
What happened:
Multi-tenant Architecture in AWS EKS:
- One shared loadbalancer
- One shared ingress controller
- One Ingress resource per tenant
- One ClusterIP Service per tenant
- Dedicated Route53 alias records per tenant
Load Balancing:
- AWS Network Load Balancer (NLB) managed directly via Terraform configuration.
- NLB Listeners are bound to Target Groups.
- Target Groups are bound to autoscaling groups using aws_autoscaling_attachment
Ingress Controller:
- One central Nginx Ingress Controller.
- The associated pods run on a node group reserved for this deployment (i.e. using taints). We run one pod per node. The node’s instance type is an AWS t3.micro.
- Exposed via a Service of type NodePort.
- Configured globally for SSL passthrough (via the --enable-ssl-passthrough flag).
Ingress Resources:
- “nginx.ingress.kubernetes.io/ssl-passthrough” annotation is set to true
- “nginx.ingress.kubernetes.io/force-ssl-redirect” annotation is set to true
- “kubernetes.io/ingress.class” annotation is set to "nginx"
Route53 records
- All the per-tenant entries are aliases to the shared loadbalancer hostname
- Inherit TTL from loadbalancer setting - 60 seconds
Testing Client: All testing was done using a Chrome browser.
What you expected to happen:
Scenario Leading to Bug:
- While developing this feature, we got to a stable state where the shared configuration, as well as two mock tenants, were running smoothly.
- We made a change to the load balancer which caused it to update its hostname.
- Immediately following this, we updated the route53 record for only one of the two tenants, and it ran smoothly.
- After a few days, we restarted the nginx-ingress pods. About 20 minutes after, we updated the other tenant’s route53 record. This is when we saw the bug.
Bug Description:
There was a short period of intermittent failure. For about 5 minutes, I would arbitrarily get the “Fake Nginx Ingress Controller Certificate” mixed in with successful responses. The fake certificate's validity period indicates I was getting responses from the new nginx-ingress pods.
Other
NGINX Ingress controller version (exec into the pod and run /nginx-ingress-controller --version
):
nginx-ingress-ingress-nginx-controller-7567b569c4-np4v6:/etc/nginx$ /nginx-ingress-controller --version
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: v1.11.5
Build: 97ffeeee0fabd4b1c6b1cdabeaed881faab612de
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.25.5
-------------------------------------------------------------------------------
Kubernetes version (use kubectl version
):
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.10-eks-bc803b4
WARNING: version difference between client (1.28) and server (1.30) exceeds the supported minor version skew of +/-1
Environment:
- Cloud provider or hardware configuration: AWS EKS cluster
- OS (e.g. from /etc/os-release):
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.21.3
- Kernel (e.g.
uname -a
):
Linux nginx-ingress-ingress-nginx-controller-7567b569c4-np4v6 6.1.131-143.221.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Mar 24 15:35:21 UTC 2025 x86_64 Linux
-
Install tools -
terraform
-
How was the ingress-nginx-controller installed:
helm_release
resource interraform
. -
Current State of the controller:
kubectl describe ingressclasses
Name: nginx
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=nginx-ingress
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.11.5
helm.sh/chart=ingress-nginx-4.11.5
Annotations: meta.helm.sh/release-name: nginx-ingress
meta.helm.sh/release-namespace: ingress-nginx
Controller: k8s.io/ingress-nginx
Events: <none>
kubectl -n <ingresscontrollernamespace> get all -A -o wide
$ kubectl -n ingress-nginx get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/nginx-ingress-ingress-nginx-controller-7567b569c4-np4v6 1/1 Running 0 51m 192.168.205.164 ip-192-168-231-33.ec2.internal <none> <none>
pod/nginx-ingress-ingress-nginx-controller-7567b569c4-smzh9 1/1 Running 0 10m 192.168.67.143 ip-192-168-119-202.ec2.internal <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/nginx-ingress-ingress-nginx-controller NodePort 10.100.79.172 <none> 80:30080/TCP,443:30443/TCP 51m app.kubernetes.io/component=controller,app.kubernetes.io/instance=nginx-ingress,app.kubernetes.io/name=ingress-nginx
service/nginx-ingress-ingress-nginx-controller-admission ClusterIP 10.100.107.25 <none> 443/TCP 51m app.kubernetes.io/component=controller,app.kubernetes.io/instance=nginx-ingress,app.kubernetes.io/name=ingress-nginx
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/nginx-ingress-ingress-nginx-controller 2/2 2 2 51m controller registry.k8s.io/ingress-nginx/controller:v1.11.5@sha256:a1cbad75b0a7098bf9325132794dddf9eef917e8a7fe246749a4cea7ff6f01eb app.kubernetes.io/component=controller,app.kubernetes.io/instance=nginx-ingress,app.kubernetes.io/name=ingress-nginx
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/nginx-ingress-ingress-nginx-controller-7567b569c4 2 2 2 51m controller registry.k8s.io/ingress-nginx/controller:v1.11.5@sha256:a1cbad75b0a7098bf9325132794dddf9eef917e8a7fe246749a4cea7ff6f01eb app.kubernetes.io/component=controller,app.kubernetes.io/instance=nginx-ingress,app.kubernetes.io/name=ingress-nginx,pod-template-hash=7567b569c4
kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
$ kubectl -n ingress-nginx describe pod/nginx-ingress-ingress-nginx-controller-7567b569c4-np4v6
Name: nginx-ingress-ingress-nginx-controller-7567b569c4-np4v6
Namespace: ingress-nginx
Priority: 0
Service Account: nginx-ingress-ingress-nginx
Node: ip-192-168-231-33.ec2.internal/192.168.231.33
Start Time: Thu, 17 Apr 2025 14:41:19 -0400
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=nginx-ingress
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.11.5
helm.sh/chart=ingress-nginx-4.11.5
pod-template-hash=7567b569c4
Annotations: <none>
Status: Running
IP: 192.168.205.164
IPs:
IP: 192.168.205.164
Controlled By: ReplicaSet/nginx-ingress-ingress-nginx-controller-7567b569c4
Containers:
controller:
Container ID: containerd://982a378d94a83f8e8969fe99e621fd23cb439b40a339f5a0d0265b8cd411a621
Image: registry.k8s.io/ingress-nginx/controller:v1.11.5@sha256:a1cbad75b0a7098bf9325132794dddf9eef917e8a7fe246749a4cea7ff6f01eb
Image ID: registry.k8s.io/ingress-nginx/controller@sha256:a1cbad75b0a7098bf9325132794dddf9eef917e8a7fe246749a4cea7ff6f01eb
Ports: 80/TCP, 443/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Args:
/nginx-ingress-controller
--publish-service=$(POD_NAMESPACE)/nginx-ingress-ingress-nginx-controller
--election-id=nginx-ingress-ingress-nginx-leader
--controller-class=k8s.io/ingress-nginx
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/nginx-ingress-ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
--enable-metrics=false
--enable-ssl-passthrough=true
State: Running
Started: Thu, 17 Apr 2025 14:41:30 -0400
Ready: True
Restart Count: 0
Requests:
cpu: 100m
memory: 90Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: nginx-ingress-ingress-nginx-controller-7567b569c4-np4v6 (v1:metadata.name)
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
Mounts:
/usr/local/certificates/ from webhook-cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hmwfk (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
webhook-cert:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-ingress-nginx-admission
Optional: false
kube-api-access-hmwfk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
node_type=central-ingress
Tolerations: dedicated=ingress:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 52m default-scheduler 0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
Normal Scheduled 51m default-scheduler Successfully assigned ingress-nginx/nginx-ingress-ingress-nginx-controller-7567b569c4-np4v6 to ip-192-168-231-33.ec2.internal
Normal Pulling 51m kubelet Pulling image "registry.k8s.io/ingress-nginx/controller:v1.11.5@sha256:a1cbad75b0a7098bf9325132794dddf9eef917e8a7fe246749a4cea7ff6f01eb"
Normal Pulled 51m kubelet Successfully pulled image "registry.k8s.io/ingress-nginx/controller:v1.11.5@sha256:a1cbad75b0a7098bf9325132794dddf9eef917e8a7fe246749a4cea7ff6f01eb" in 9.635s (9.635s including waiting). Image size: 105563032 bytes.
Normal Created 51m kubelet Created container controller
Normal Started 51m kubelet Started container controller
Normal RELOAD 26m (x2 over 51m) nginx-ingress-controller NGINX reload triggered due to a change in configuration
kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
$ kubectl -n ingress-nginx describe service/nginx-ingress-ingress-nginx-controller
Name: nginx-ingress-ingress-nginx-controller
Namespace: ingress-nginx
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=nginx-ingress
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.11.5
helm.sh/chart=ingress-nginx-4.11.5
Annotations: meta.helm.sh/release-name: nginx-ingress
meta.helm.sh/release-namespace: ingress-nginx
Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=nginx-ingress,app.kubernetes.io/name=ingress-nginx
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.100.79.172
IPs: 10.100.79.172
Port: http 80/TCP
TargetPort: http/TCP
NodePort: http 30080/TCP
Endpoints: 192.168.205.164:80,192.168.67.143:80
Port: https 443/TCP
TargetPort: https/TCP
NodePort: https 30443/TCP
Endpoints: 192.168.205.164:443,192.168.67.143:443
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
- Current state of ingress object, if applicable:
kubectl -n <appnamespace> get all,ing -o wide
kubectl -n <appnamespace> describe ing <ingressname>
kubectl -n namespace describe ingress.networking.k8s.io/name
Name: name
Labels: <none>
Namespace: namespace
Address: 10.100.79.172
Ingress Class: <none>
Default backend: <default>
Rules: REDACTED
Annotations: kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/force-ssl-redirect: true
nginx.ingress.kubernetes.io/ssl-passthrough: true
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Sync 29m (x2 over 29m) nginx-ingress-controller Scheduled for sync
Normal Sync 29m (x2 over 29m) nginx-ingress-controller Scheduled for sync
Normal Sync 15m nginx-ingress-controller Scheduled for sync
-
If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag
-
Others:
- Any other related information like ;
- copy/paste of the snippet (if applicable)
kubectl describe ...
of any custom configmap(s) created and in use- Any other related information that may help
- Any other related information like ;
How to reproduce this issue:
Anything else we need to know:
Notes
I saw this failure as well in an earlier development stage. In that scenario, I was creating the nginx ingress controller using LoadBalancer as its service type and automatically provisioning CLB load balancers. The time duration of this intermittent behavior was much longer: it lasted for a few hours.
Final Note
We're not entirely sure the issue lies with the NGINX Ingress Controller, so we've detailed our entire setup. Since the problem is stochastic and hard to reproduce, it's been very difficult to investigate. We were hoping you might have encountered something similar with the Ingress Controller. Even just confirming that this behavior only occurs for a limited time following the NXDomain setup would give us much more confidence in using this configuration in production. Lastly, we did read the security warnings about using this controller in multi-tenant setups and followed all good practices.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status