Skip to content

REC Cluster doesn't deploy - Azure AKS #275

@aammundi

Description

@aammundi

Hi,
In deploying a REC cluster on Azure AKS, I have followed the steps here: https://redis.io/docs/latest/operate/kubernetes/deployment/quick-start/

Have made several attempts, in almost all of those, the redis pods get crash-loop-backed-off and eventually killed.
The experiment I'm doing is fairly repeatable.

  1. create resource group (azure)
  2. create k8's cluster
  3. (try to) deploy REC - REC pods never come up
  4. delete Resource group (which deletes all underlying resources, pvc's etc)
  5. back to step 1

the reason for these iterations is because I had issues with node pools and such and iteratively eliminated those issues.
Once I had the right node pools and reqs/limits in place it did come up once. At which point I decided to formalize/clean-up my code and retry from the top.

However, it's back to the crash-loop

Some observations from logs:
2024-07-03 00:40:52,210 - services-rigger.rs - INFO - got an exception while trying to communicate with Redis Enterprise cluster: HTTPSConnectionPool(host='redis', port=9443): Max retries exceeded with url: /v1/nodes (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f1d1988ac70>: Failed to establish a new connection: [Errno 111] Connection refused'))

redis: Is the ClusterIP. I checked via dnsUtils and this is resolvable

kubectl exec -i -t dnsutils -- nslookup redis
Server:		10.0.0.10
Address:	10.0.0.10#53

Name:	redis.ttinfra.svc.cluster.local
Address: 10.0.44.57

Attached is a log generated from log_collector.
redis_enterprise_k8s_debug_info_20240702-181115.tar.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions