Skip to content

Intermittent TLS termination when using SSL passthrough after the target domain’s DNS has been invalid for some time #13238

Open
@irugina1

Description

@irugina1

What happened:

Multi-tenant Architecture in AWS EKS:

  • One shared loadbalancer
  • One shared ingress controller
  • One Ingress resource per tenant
  • One ClusterIP Service per tenant
  • Dedicated Route53 alias records per tenant

Load Balancing:

  • AWS Network Load Balancer (NLB) managed directly via Terraform configuration.
  • NLB Listeners are bound to Target Groups.
  • Target Groups are bound to autoscaling groups using aws_autoscaling_attachment

Ingress Controller:

  • One central Nginx Ingress Controller.
  • The associated pods run on a node group reserved for this deployment (i.e. using taints). We run one pod per node. The node’s instance type is an AWS t3.micro.
  • Exposed via a Service of type NodePort.
  • Configured globally for SSL passthrough (via the --enable-ssl-passthrough flag).

Ingress Resources:

  • “nginx.ingress.kubernetes.io/ssl-passthrough” annotation is set to true
  • “nginx.ingress.kubernetes.io/force-ssl-redirect” annotation is set to true
  • “kubernetes.io/ingress.class” annotation is set to "nginx"

Route53 records

  • All the per-tenant entries are aliases to the shared loadbalancer hostname
  • Inherit TTL from loadbalancer setting - 60 seconds

Testing Client: All testing was done using a Chrome browser.

What you expected to happen:

Scenario Leading to Bug:

  • While developing this feature, we got to a stable state where the shared configuration, as well as two mock tenants, were running smoothly.
  • We made a change to the load balancer which caused it to update its hostname.
  • Immediately following this, we updated the route53 record for only one of the two tenants, and it ran smoothly.
  • After a few days, we restarted the nginx-ingress pods. About 20 minutes after, we updated the other tenant’s route53 record. This is when we saw the bug.

Bug Description:
There was a short period of intermittent failure. For about 5 minutes, I would arbitrarily get the “Fake Nginx Ingress Controller Certificate” mixed in with successful responses. The fake certificate's validity period indicates I was getting responses from the new nginx-ingress pods.

Other

NGINX Ingress controller version (exec into the pod and run /nginx-ingress-controller --version):

nginx-ingress-ingress-nginx-controller-7567b569c4-np4v6:/etc/nginx$ /nginx-ingress-controller --version
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.11.5
  Build:         97ffeeee0fabd4b1c6b1cdabeaed881faab612de
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.25.5

-------------------------------------------------------------------------------

Kubernetes version (use kubectl version):

Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.10-eks-bc803b4
WARNING: version difference between client (1.28) and server (1.30) exceeds the supported minor version skew of +/-1

Environment:

  • Cloud provider or hardware configuration: AWS EKS cluster
  • OS (e.g. from /etc/os-release):
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.21.3
  • Kernel (e.g. uname -a):
Linux nginx-ingress-ingress-nginx-controller-7567b569c4-np4v6 6.1.131-143.221.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Mar 24 15:35:21 UTC 2025 x86_64 Linux
  • Install tools - terraform

  • How was the ingress-nginx-controller installed: helm_release resource in terraform.

  • Current State of the controller:

    • kubectl describe ingressclasses
Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=nginx-ingress
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.11.5
              helm.sh/chart=ingress-nginx-4.11.5
Annotations:  meta.helm.sh/release-name: nginx-ingress
              meta.helm.sh/release-namespace: ingress-nginx
Controller:   k8s.io/ingress-nginx
Events:       <none>
  • kubectl -n <ingresscontrollernamespace> get all -A -o wide
$ kubectl -n ingress-nginx get all -o wide
NAME                                                          READY   STATUS    RESTARTS   AGE   IP                NODE                              NOMINATED NODE   READINESS GATES
pod/nginx-ingress-ingress-nginx-controller-7567b569c4-np4v6   1/1     Running   0          51m   192.168.205.164   ip-192-168-231-33.ec2.internal    <none>           <none>
pod/nginx-ingress-ingress-nginx-controller-7567b569c4-smzh9   1/1     Running   0          10m   192.168.67.143    ip-192-168-119-202.ec2.internal   <none>           <none>

NAME                                                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE   SELECTOR
service/nginx-ingress-ingress-nginx-controller             NodePort    10.100.79.172   <none>        80:30080/TCP,443:30443/TCP   51m   app.kubernetes.io/component=controller,app.kubernetes.io/instance=nginx-ingress,app.kubernetes.io/name=ingress-nginx
service/nginx-ingress-ingress-nginx-controller-admission   ClusterIP   10.100.107.25   <none>        443/TCP                      51m   app.kubernetes.io/component=controller,app.kubernetes.io/instance=nginx-ingress,app.kubernetes.io/name=ingress-nginx

NAME                                                     READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                                                                                                                     SELECTOR
deployment.apps/nginx-ingress-ingress-nginx-controller   2/2     2            2           51m   controller   registry.k8s.io/ingress-nginx/controller:v1.11.5@sha256:a1cbad75b0a7098bf9325132794dddf9eef917e8a7fe246749a4cea7ff6f01eb   app.kubernetes.io/component=controller,app.kubernetes.io/instance=nginx-ingress,app.kubernetes.io/name=ingress-nginx

NAME                                                                DESIRED   CURRENT   READY   AGE   CONTAINERS   IMAGES                                                                                                                     SELECTOR
replicaset.apps/nginx-ingress-ingress-nginx-controller-7567b569c4   2         2         2       51m   controller   registry.k8s.io/ingress-nginx/controller:v1.11.5@sha256:a1cbad75b0a7098bf9325132794dddf9eef917e8a7fe246749a4cea7ff6f01eb   app.kubernetes.io/component=controller,app.kubernetes.io/instance=nginx-ingress,app.kubernetes.io/name=ingress-nginx,pod-template-hash=7567b569c4
  • kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
$ kubectl -n ingress-nginx describe  pod/nginx-ingress-ingress-nginx-controller-7567b569c4-np4v6
Name:             nginx-ingress-ingress-nginx-controller-7567b569c4-np4v6
Namespace:        ingress-nginx
Priority:         0
Service Account:  nginx-ingress-ingress-nginx
Node:             ip-192-168-231-33.ec2.internal/192.168.231.33
Start Time:       Thu, 17 Apr 2025 14:41:19 -0400
Labels:           app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=nginx-ingress
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=ingress-nginx
                  app.kubernetes.io/version=1.11.5
                  helm.sh/chart=ingress-nginx-4.11.5
                  pod-template-hash=7567b569c4
Annotations:      <none>
Status:           Running
IP:               192.168.205.164
IPs:
  IP:           192.168.205.164
Controlled By:  ReplicaSet/nginx-ingress-ingress-nginx-controller-7567b569c4
Containers:
  controller:
    Container ID:    containerd://982a378d94a83f8e8969fe99e621fd23cb439b40a339f5a0d0265b8cd411a621
    Image:           registry.k8s.io/ingress-nginx/controller:v1.11.5@sha256:a1cbad75b0a7098bf9325132794dddf9eef917e8a7fe246749a4cea7ff6f01eb
    Image ID:        registry.k8s.io/ingress-nginx/controller@sha256:a1cbad75b0a7098bf9325132794dddf9eef917e8a7fe246749a4cea7ff6f01eb
    Ports:           80/TCP, 443/TCP, 8443/TCP
    Host Ports:      0/TCP, 0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/nginx-ingress-ingress-nginx-controller
      --election-id=nginx-ingress-ingress-nginx-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/nginx-ingress-ingress-nginx-controller
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
      --enable-metrics=false
      --enable-ssl-passthrough=true
    State:          Running
      Started:      Thu, 17 Apr 2025 14:41:30 -0400
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      100m
      memory:   90Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       nginx-ingress-ingress-nginx-controller-7567b569c4-np4v6 (v1:metadata.name)
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hmwfk (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       True 
  ContainersReady             True 
  PodScheduled                True 
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nginx-ingress-ingress-nginx-admission
    Optional:    false
  kube-api-access-hmwfk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
                             node_type=central-ingress
Tolerations:                 dedicated=ingress:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                From                      Message
  ----     ------            ----               ----                      -------
  Warning  FailedScheduling  52m                default-scheduler         0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
  Normal   Scheduled         51m                default-scheduler         Successfully assigned ingress-nginx/nginx-ingress-ingress-nginx-controller-7567b569c4-np4v6 to ip-192-168-231-33.ec2.internal
  Normal   Pulling           51m                kubelet                   Pulling image "registry.k8s.io/ingress-nginx/controller:v1.11.5@sha256:a1cbad75b0a7098bf9325132794dddf9eef917e8a7fe246749a4cea7ff6f01eb"
  Normal   Pulled            51m                kubelet                   Successfully pulled image "registry.k8s.io/ingress-nginx/controller:v1.11.5@sha256:a1cbad75b0a7098bf9325132794dddf9eef917e8a7fe246749a4cea7ff6f01eb" in 9.635s (9.635s including waiting). Image size: 105563032 bytes.
  Normal   Created           51m                kubelet                   Created container controller
  Normal   Started           51m                kubelet                   Started container controller
  Normal   RELOAD            26m (x2 over 51m)  nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  • kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
$ kubectl -n ingress-nginx describe  service/nginx-ingress-ingress-nginx-controller
Name:                     nginx-ingress-ingress-nginx-controller
Namespace:                ingress-nginx
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=nginx-ingress
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.11.5
                          helm.sh/chart=ingress-nginx-4.11.5
Annotations:              meta.helm.sh/release-name: nginx-ingress
                          meta.helm.sh/release-namespace: ingress-nginx
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=nginx-ingress,app.kubernetes.io/name=ingress-nginx
Type:                     NodePort
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.100.79.172
IPs:                      10.100.79.172
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  30080/TCP
Endpoints:                192.168.205.164:80,192.168.67.143:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  30443/TCP
Endpoints:                192.168.205.164:443,192.168.67.143:443
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>
  • Current state of ingress object, if applicable:
    • kubectl -n <appnamespace> get all,ing -o wide
    • kubectl -n <appnamespace> describe ing <ingressname>
kubectl -n namespace describe  ingress.networking.k8s.io/name
Name:             name
Labels:           <none>
Namespace:        namespace
Address:          10.100.79.172
Ingress Class:    <none>
Default backend:  <default>
Rules: REDACTED
Annotations:                 kubernetes.io/ingress.class: nginx
                             nginx.ingress.kubernetes.io/force-ssl-redirect: true
                             nginx.ingress.kubernetes.io/ssl-passthrough: true
Events:
  Type    Reason  Age                From                      Message
  ----    ------  ----               ----                      -------
  Normal  Sync    29m (x2 over 29m)  nginx-ingress-controller  Scheduled for sync
  Normal  Sync    29m (x2 over 29m)  nginx-ingress-controller  Scheduled for sync
  Normal  Sync    15m                nginx-ingress-controller  Scheduled for sync
  • If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag

  • Others:

    • Any other related information like ;
      • copy/paste of the snippet (if applicable)
      • kubectl describe ... of any custom configmap(s) created and in use
      • Any other related information that may help

How to reproduce this issue:

Anything else we need to know:

Notes

I saw this failure as well in an earlier development stage. In that scenario, I was creating the nginx ingress controller using LoadBalancer as its service type and automatically provisioning CLB load balancers. The time duration of this intermittent behavior was much longer: it lasted for a few hours.

Final Note

We're not entirely sure the issue lies with the NGINX Ingress Controller, so we've detailed our entire setup. Since the problem is stochastic and hard to reproduce, it's been very difficult to investigate. We were hoping you might have encountered something similar with the Ingress Controller. Even just confirming that this behavior only occurs for a limited time following the NXDomain setup would give us much more confidence in using this configuration in production. Lastly, we did read the security warnings about using this controller in multi-tenant setups and followed all good practices.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/supportCategorizes issue or PR as a support question.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.needs-priorityneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions