In this dojo, you will learn to use custom metrics to scale a service deployed in a Kubernetes cluster. You will even scale the service down to zero instances when possible, to avoid wasting resources.
For this exercise, you will need :
Make sure Docker is running on your computer. Otherwise, none of this will work.
Create a local Kubernetes cluster with the exercise ready:
./scripts/bootstrap.shYou now have a Kubernetes cluster running on your machine. Have a look at what is inside:
kubectl get pods -AAn application is deployed to the default namespace. It has a simple
event-driven architecture:
- Users send requests to the producer.
- The producer pushes items to a queue stored in a Redis instance.
- A consumer pops items from the queue.
- The consumer does some work for each item.
Try it out! First, tell the producer to publish a message:
curl -X POST producer.vcap.me/publish
# or
http POST producer.vcap.me/publishNow check the consumer's logs:
kubectl logs -l app.kubernetes.io/name=consumerNotice that the consumer takes 1 second to process an item from the queue. Now, publish 10 messages at once:
curl -X POST producer.vcap.me/publish/10
# or
http POST producer.vcap.me/publish/10The consumer's logs show that it only processes one item at a time. By adding more instances of the consumer, your application can process items faster. Scale the consumer to 3 instances:
kubectl scale deployment consumer --replicas=3Now publish 10 messages again. By watching all of the consumers' logs at once, you should see that they can each process an item at a time. Your application now has 3 times the throughput!
You might have guessed that you are going to scale the number of consumer instances based on the number of items waiting to be processed. If you did, you are right!
Let's walk through how you are going to do this.
First, you need to deploy a monitoring system to measure the length of your Redis queue. You are going to do this with Prometheus and the Redis exporter.
Second, you need to expose this metric inside the Kubernetes API. The native Kubernetes pod autoscaler does not know how to query Prometheus. You are going to use the Prometheus adapter to serve as a bridge between the Kubernetes API server and Prometheus.
Finally, you are going to deploy a horizontal pod autoscaler that will set the number of replicas of your consumer based on the metric found in the Kubernetes API.
This is what the target architecture looks like:
The simplest way to deploy Prometheus to your cluster is with the prometheus-community/kube-prometheus-stack Helm chart.
Here is what you need to do:
- Set Prometheus's scrape interval to 5s.
- Make Prometheus watch all Rules and ServiceMonitors in the cluster.
- Create an Ingress for Prometheus with the
prometheus.vcap.mehostname. - Deploy Prometheus to the
prometheusnamespace. - Go to the Prometheus UI: http://prometheus.vcap.me.
Ready? Set. Go!
Hint n°1
Have a look at the chart's values.yml file. Everything you need is in there.
Hint n°2
You can configure Prometheus with the prometheus.prometheusSpec field.
Hint n°3
Setting matchLabels to an empty object makes it match everything.
Compare your work to the solution before moving on. Are there differences? Is your approach better or worse? Why?
You can find this step's solution here:
The easiest way to deploy the Redis exporter is with the prometheus-community/prometheus-redis-exporter chart.
Here is what you need to do:
-
Configure the chart to create a ServiceMonitor for the exporter.
-
Configure the exporter to connect to the
redis-masterService. -
Configure the exporter to watch a single key:
padok. -
Deploy the exporter to the
defaultnamespace. -
See your application activity in the Prometheus UI with this query:
rate(redis_commands_processed_total{service="prometheus-redis-exporter",namespace="default"}[20s])
You can do it!
Hint n°1
Have a look at the chart's values.yml file. Everything you need is in there.
Hint n°2
Have a look at the exporter's GitHub repository.
Hint n°3
Did you notice the REDIS_EXPORTER_CHECK_SINGLE_KEYS variable?
Compare your work to the solution before moving on. Are there differences? Is your approach better or worse? Why?
You can find this step's solution here:
The Redis exporter has a feature that is going to be a problem for you: when your Redis queue is empty, the exporter does not expose a metric for it. See for yourself:
-
Run this query in the Prometheus UI:
redis_key_size{service="prometheus-redis-exporter",namespace="default",key="padok"} -
Publish a small number of items to your queue.
-
See how the metric exists when there are items in the queue, but not when the queue is empty.
Here is what you need to do to work around this:
-
Add a PrometheusRule resource to the
consumerchart. -
In the rule, define a new metric called
redis_items_in_queue. -
In the rule, write a PromQL query that makes it so that:
- When
redis_key_size{service="prometheus-redis-exporter",namespace="{{ .Release.Namespace }}",key="padok"}exists,redis_items_in_queuehas the same value and labels. - When
redis_key_sizeis null,redis_items_in_queuehas a value of 0 with the following labels:{service="prometheus-redis-exporter",namespace="{{ .Release.Namespace }}",key="padok"}
- When
-
Update the
consumer's release (there's a script for that). -
Check that this query returns a value whether the queue is empty or not:
redis_items_in_queue{service="prometheus-redis-exporter",namespace="default",key="padok"}
What are you waiting for?
Hint n°1
This great blog article has an example of a PrometheusRule.
Hint n°2
Have a look at the absent and clamp_max Prometheus functions, and the or
keyword.
Hint n°3
This is the PromQL query to use in your PrometheusRule:
redis_key_size{service="prometheus-redis-exporter",namespace="default",key="padok"}
or
clamp_max(absent(redis_key_size{service="prometheus-redis-exporter",namespace="default",key="padok"}), 0)
Compare your work to the solution before moving on. Are there differences? Is your approach better or worse? Why?
You can find this step's solution here:
The easiest way to deploy the Prometheus Adapter is with the prometheus-community/prometheus-adapter chart.
Here is what you need to do:
-
Configure the adapter to query the existing Prometheus service.
-
Configure the adapter to expose all metrics starting with
redis_. -
Deploy the adapter to the
prometheusnamespace. -
Read the number of items in your queue from the Kubernetes API :
kubectl get --raw '/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/services/prometheus-redis-exporter/redis_items_in_queue' | jq
Get to it!
Hint n°1
Have a look at the chart's values.yml file. Everything you need is in there.
Hint n°2
Have a look at this documentation. The introduction is very helpful. The Discovery and Querying might help you too.
Hint n°3
You need to define a custom rule. You only need to set the seriesQuery and
metricsQuery fields. Forget name and resources.
Compare your work to the solution before moving on. Are there differences? Is your approach better or worse? Why?
You can find this step's solution here:
This step is pretty self-explanatory. Here is what you need to do:
- Add a v2beta2 HorizontalPodAutoscaler resource to the
consumerchart. - Configure the HPA to scale the consumer's Deployment.
- Set
minReplicasto1andmaxReplicasto20. - Scale the Deployment based on a metric of type
Object. - Scale based on the
redis_items_in_queuetimeseries with thekey=padoklabel. - Set a target average of 20 items per consumer replica.
- Update the
consumer's release (there's a script for that). - Check that the number of consumers adapts to the number of items in the queue.
Are you ready? Go!
Hint n°1
This great blog article has an example of a v2beta2 HorizontalPodAutoscaler.
Hint n°2
The Prometheus adapter mapped the metrics to Kubernetes resources. This mapping is based on the timeseries' labels.
Hint n°3
The metrics are mapped to the prometheus-redis-exporter Service.
Compare your work to the solution before moving on. Are there differences? Is your approach better or worse? Why?
You can find this step's solution here:
Your local cluster has a special feature activated: your HPA can scale to 0 instances if you let it. Here is what you need to do:
- Set the consumer's HPA's minimum to 0.
- Update the
consumerrelease. - Check that the HPA scales to 0 instances when the Redis queue is empty.
Easy, wasn't it?
If you haven't already, have a look at your HPA:
kubectl get hpa consumerYou should notice that there is something wrong with what is displayed.
-9223372036854775808 is a strange number. You might know that it is equal to
-2^63. But why is your HPA displaying this nonsensical value? Is this a bug in
kubectl?
Check your HPA's status directly, without kubectl's user-friendly format:
kubectl get hpa consumer -o json | jq .statusThe value is not there. There is a simple reason why. By default, kubectl
fetches your HPA from the autoscaling/v1 API, instead of the
autoscaling/v2beta2 API that we want to use. See for yourself:
kubectl get hpa consumer -o json | jq .apiVersionKubernetes translates between versions for you, so your HPA exists in both versions. You can access specific API versions like this:
kubectl get hpa.v2beta2.autoscaling consumer -o json | jq .apiVersionCheck your HPA's status in the autoscaling/v2beta2 API:
kubectl get hpa.v2beta2.autoscaling consumer -o json | jq .statusYou should see the value there. The only way for this value to be in your HPA's raw status is if the Kubernetes controller manager put it there. This seems like a bug. Here is what you need to do:
- Go the kubernetes/kubernetes repository.
- Find the line that causes the bug.
Debugging Kubernetes. This should be fun!
Hint n°1
The controller manager's code is in the pkg/controller directory.
Hint n°2
The HPA computes its status in the podautoscaler/replica_calculator.go file.
Hint n°3
In Go, diving a floating value by 0 gives a special number: math.NaN().
Casting this value to a 64-bit integer results in -9223372036854775808, as
seen here.
Compare your work to the solution before moving on. Are there differences? Is your approach better or worse? Why?
The bug is right here. The Kubernetes controller divides resource utilisation by the number of replicas to compute an average. When the number of replicas is 0, the Kubernetes controller divides by 0! 😱
Once you are done with this exercise, you can destroy your local environment:
./scripts/teardown.shI hope you had fun and learned something!

