-
Notifications
You must be signed in to change notification settings - Fork 92
Description
Hi,
In deploying a REC cluster on Azure AKS, I have followed the steps here: https://redis.io/docs/latest/operate/kubernetes/deployment/quick-start/
Have made several attempts, in almost all of those, the redis pods get crash-loop-backed-off and eventually killed.
The experiment I'm doing is fairly repeatable.
- create resource group (azure)
- create k8's cluster
- (try to) deploy REC - REC pods never come up
- delete Resource group (which deletes all underlying resources, pvc's etc)
- back to step 1
the reason for these iterations is because I had issues with node pools and such and iteratively eliminated those issues.
Once I had the right node pools and reqs/limits in place it did come up once. At which point I decided to formalize/clean-up my code and retry from the top.
However, it's back to the crash-loop
Some observations from logs:
2024-07-03 00:40:52,210 - services-rigger.rs - INFO - got an exception while trying to communicate with Redis Enterprise cluster: HTTPSConnectionPool(host='redis', port=9443): Max retries exceeded with url: /v1/nodes (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f1d1988ac70>: Failed to establish a new connection: [Errno 111] Connection refused'))
redis: Is the ClusterIP. I checked via dnsUtils and this is resolvable
kubectl exec -i -t dnsutils -- nslookup redis
Server: 10.0.0.10
Address: 10.0.0.10#53
Name: redis.ttinfra.svc.cluster.local
Address: 10.0.44.57
Attached is a log generated from log_collector.
redis_enterprise_k8s_debug_info_20240702-181115.tar.gz