Optimizing Kubernetes Vertical Pod Autoscaler responsiveness

Posted on February 24, 2023 by 川井拓真

Guest post originally published on the Miraxia blog by 川井拓真

Japanese version here.

Few weeks ago, I was struggling to optimize the Vertical Pod Autoscaler performance. We’d been planning a presentation in my company, and it should be 5 to 10 minutes long, so the default VPA behavior was too slow for us. Eventually, I had to modify its source to change pod’s resource requests dynamically in minutes.

In this article, I describe how VPA works and a way to optimize its performance, including code modification.

What is Vertical Pod Autoscaler?

The Vertical Pod Autoscaler, VPA, enables us to define resource requests dynamically. It observes the cpu and memory utilization of pods, and updates these resource requests at runtime.

How Vertical Pod Autoscaler works?

The VPA is consists of the following three components.

The admission-controller register an admission webhook to modify pod creation request.

The recommender checks the resource usage of the target pods, and estimate the recommended resource requests, then update the VPA object associated to the target pods. This process is performed periodically.

Diagram showing flow from recommender check resource usage to metrics-server, however if the recommender update recommendation, will lead to VPA object (custom resource)

The updater checks the recommended resource requests recorded in the VPA object, and if the current resource requests of the target pod are not match with the recommendation, it deletes the pod.

Diagram flow showing flow from updater delete if resource requests don't match with recommendation, however if the updater check recommendation will lead to VPA object (custom resource)

When a pod of a deployment or a replica set has deleted by the updater, the replica set controller will detect the number of pods is not enough, and will try to create a new pod. The pod creation request is sent to the api-server, and api-server calls the webhook registered by the admission-controller. The admission-controller modifies the pod creation request according to the recommended resource requests values. Finally, the new pod created with the appropriate resource requests.

Diagram flow showing admission controller process to ReplicaSet Controller, new pod and VPA object (custom resource)

Change periodic task frequency

The admission-controller works synchronously when a pod creation request is made, so we can forget about the admission-controller in this article. It never affect the responsiveness of VPA.

Since the recommender and updater work periodically, their frequency is important. These can be modified by --recommender-interval and --updater-interval options.

You can change these options by modifying recommender-deployment.yaml and updater-deployment.yaml, like this:

Patch license: Apache-2.0 (same as original autoscaler)
https://github.com/kubernetes/autoscaler

diff --git a/vertical-pod-autoscaler/deploy/recommender-deployment.yaml b/vertical-pod-autoscaler/deploy/recommender-deployment.yaml
index f45d87127..739c35da6 100644
--- a/vertical-pod-autoscaler/deploy/recommender-deployment.yaml
+++ b/vertical-pod-autoscaler/deploy/recommender-deployment.yaml
@@ -38,3 +38,9 @@ spec:
         ports:
         - name: prometheus
           containerPort: 8942
+        command:
+        - /recommender
+        - --v=4
+        - --stderrthreshold=info
+        - --prometheus-address=http://prometheus.monitoring.svc
+        - --recommender-interval=10s
diff --git a/vertical-pod-autoscaler/deploy/updater-deployment.yaml b/vertical-pod-autoscaler/deploy/updater-deployment.yaml
index a97478a8e..b367f1a04 100644
--- a/vertical-pod-autoscaler/deploy/updater-deployment.yaml
+++ b/vertical-pod-autoscaler/deploy/updater-deployment.yaml
@@ -43,3 +43,7 @@ spec:
           ports:
             - name: prometheus
               containerPort: 8943
+          args:
+          - --v=4
+          - --stderrthreshold=info
+          - --updater-interval=10s

The other options, --v, --stderrthreshold and --recommender-interval are default ones defined in recommender/Dockerfile and updater/Dockerfile.

Warning: --recommender-interval=10s and --updater-interval=10s are too frequent for normal usecases. Do not copy and paste this for your real-world cluster!

The recommendation algorithm

Despite the updater responsiveness could be controlled only by --updater-interval, for the recommender we need some more modification. To improve the responsiveness of the recommender, we have to understand the recommendation algorithm.

The recommender defines the lower and upper limit of the resource requests for pods, and the updater checks the current resource requests of a pod is in the recommended range. So the problem is, the recommender is designed to make these limits wider if the resource utilization of the pods are changing fast. This design is reasonable for the real-world use cases, but for our demonstration, I want them to react more quickly.

The recommender records resource utilization history of the target pods and makes recommendation range wider if its volatility is too high. You can reduce the effect of the volatility with --cpu-histogram-decay-half-life option.

Patch license: Apache-2.0 (same as original autoscaler)
https://github.com/kubernetes/autoscaler

diff --git a/vertical-pod-autoscaler/deploy/recommender-deployment.yaml b/vertical-pod-autoscaler/deploy/recommender-deployment.yaml
index 739c35da6..09113a02e 100644
--- a/vertical-pod-autoscaler/deploy/recommender-deployment.yaml
+++ b/vertical-pod-autoscaler/deploy/recommender-deployment.yaml
@@ -44,3 +44,4 @@ spec:
         - --stderrthreshold=info
         - --prometheus-address=http://prometheus.monitoring.svc
         - --recommender-interval=10s
+        - --cpu-histogram-decay-half-life=10s

Warning: --cpu-histogram-decay-half-life=10s is too fast for normal usecases. Do not copy and paste this for your real-world cluster!

Another factor that makes recommendation range wider is the length of the resource utilization history.

According to the CreatePodResourceRecommender implementation, it combining the following estimators.

percentileEstimator : returns the actual resource utilization
marginEstimator : adds safety margin to the recommendation
confidenceMultiplier : make recommendation range wider regarding the resource utilization history length

The confidenceMultiplier make recommendation range wider with the folowing formulae.

If we have only 3 minutes long history, and the MeasuredResourceUtilization = 0.6 cpu, the confidenceMultiplier calculates as like this:

In this situation, while the pod’s resources.requests.cpu is in range [0.27 cpu, 289 cpu], nothing happens. In other words, the updater determines “Ah, the pod’s requesting resource is in the suggested range, so I have nothing to do so far”.

Since this suggestion range varies in the order of a day, as long as I use the default implementation, I never see any reaction of the VPA in our 5 to 10 minutes long presentation.

Note that the description in this section is bit inaccurate because I didn’t explain about resource utilization histogram, but the conclusion is almost correct, I think.

Modifying the recommendation algorithm

Unfortunately, the parameters determine the extent of the recommendation range are hardcoded. So I decided to modify the code.

The requirements for our presentation are…

The recommendation limits shouldn’t be affected by the hitory length
The recommendation should reflect the measured resource utilization
We should be able to manipulate the actual resource usage to be out of the recommendation range in minutes

I created the new design reflecting these requirements as follows. Simpler is better for manual manipulation.

This new design can be implemented as follows.

Patch license: Apache-2.0 (same as original autoscaler)
https://github.com/kubernetes/autoscaler

diff --git a/vertical-pod-autoscaler/pkg/recommender/logic/recommender.go b/vertical-pod-autoscaler/pkg/recommender/logic/recommender.go
index bc2320cca..cdce617fd 100644
--- a/vertical-pod-autoscaler/pkg/recommender/logic/recommender.go
+++ b/vertical-pod-autoscaler/pkg/recommender/logic/recommender.go
@@ -111,9 +111,9 @@ func CreatePodResourceRecommender() PodResourceRecommender {
        lowerBoundEstimator := NewPercentileEstimator(lowerBoundCPUPercentile, lowerBoundMemoryPeaksPercentile)
        upperBoundEstimator := NewPercentileEstimator(upperBoundCPUPercentile, upperBoundMemoryPeaksPercentile)

-       targetEstimator = WithMargin(*safetyMarginFraction, targetEstimator)
-       lowerBoundEstimator = WithMargin(*safetyMarginFraction, lowerBoundEstimator)
-       upperBoundEstimator = WithMargin(*safetyMarginFraction, upperBoundEstimator)
+       targetEstimator = WithMargin(0, targetEstimator)
+       lowerBoundEstimator = WithMargin(-0.3, lowerBoundEstimator)
+       upperBoundEstimator = WithMargin(0.3, upperBoundEstimator)

        // Apply confidence multiplier to the upper bound estimator. This means
        // that the updater will be less eager to evict pods with short history
@@ -126,7 +126,7 @@ func CreatePodResourceRecommender() PodResourceRecommender {
        // 12h history    : *3    (force pod eviction if the request is > 3 * upper bound)
        // 24h history    : *2
        // 1 week history : *1.14
-       upperBoundEstimator = WithConfidenceMultiplier(1.0, 1.0, upperBoundEstimator)
+       // upperBoundEstimator = WithConfidenceMultiplier(1.0, 1.0, upperBoundEstimator)

        // Apply confidence multiplier to the lower bound estimator. This means
        // that the updater will be less eager to evict pods with short history
@@ -140,7 +140,7 @@ func CreatePodResourceRecommender() PodResourceRecommender {
        // 5m history   : *0.6 (force pod eviction if the request is < 0.6 * lower bound)
        // 30m history  : *0.9
        // 60m history  : *0.95
-       lowerBoundEstimator = WithConfidenceMultiplier(0.001, -2.0, lowerBoundEstimator)
+       // lowerBoundEstimator = WithConfidenceMultiplier(0.001, -2.0, lowerBoundEstimator)

        return &podResourceRecommender{
                targetEstimator,

Conclusion

The Vertical Pod Autoscaler is designed as not modifying resource requests too frequent
Unfortunately, the parameters that determine how frequent the modification made are hardcoded
In order to react changes of resource utilization in the order of a minute, you’ll have to modify the source code

Related works by others

In-place Pod Vertical Scaling feature (Work in progress)

Yokohama, Japan