Originally published on Medium by Mohamed Ahmed

Kubernetes at its core is a resources management and orchestration tool. It is ok to focus day-1 operations to explore and play around with its cool features to deploy, monitor and control your pods. However, you need to think of day-2 operations as well. You need to focus on questions like:

I’m providing in this post a high-level overview of different scalability mechanisms inside Kubernetes and best ways to make them serve your needs. Remember, to truly master Kubernetes, you need to master different ways to manage the scale of cluster resources, that’s the core of promise of Kubernetes.

Configuring Kubernetes clusters to balance resources and performance can be challenging, and requires expert knowledge of the inner workings of Kubernetes. Just because your app or services’ workload isn’t constant, it rather fluctuates throughout the day if not the hour. Think of it as a journey and ongoing process.

Kubernetes Autoscaling BuildingBlocks

Effective kubernetes auto-scaling requires coordination between two layers of scalability: (1) Pods layer autoscalers, this includes Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA); both scale available resources for your containers, and (2) cluster level scalability, which managed by the Cluster Autoscaler (CA); it scales up or down the number of nodes inside your cluster.

Horizontal Pod Autoscaler (HPA)

As the name implies, HPA scales the number of pod replicas. Most DevOps use CPU and memory as the triggers to scale more pod replicas or less. However, you can configure it to scale your pods based on custom metrics, multiple metrics, or even external metrics.

High-level HPAworkflow

High-level HPAworkflow diagram
  1. HPA continuously checks metrics values you configure during setup AT A DEFAULT 30 SEC intervals
  2. HPA attempts to increase the number of pods If the SPECIFIED threshold is met
  3. HPA mainly updates the number of replicas inside the deployment or replication controller
  4. The Deployment/Replication Controller WOULD THEN roll-out ANY additional needed pods

Consider these as you rollout HPA:

Vertical Pods Autoscaler

Vertical Pods Autoscaler (VPA) allocates more (or less) cpu or memory to existing pods. Think of it as giving pods some growth hormones:) It can work for both stateful and stateless pods but it is built mainly for stateful services. However, you can use it for stateless pods as well if you would like to implement an auto-correction of resources you initially allocated for your pods. VPA can also reacts to OOM (out of memory) events. VPA requires currently for the pods to be restarted to change allocated cpu and memory. When VPA restarts pods it respects pods distribution budget (PDB) to make sure there is always the minimum required number of of pods. You can set the min and max of resources that the VPA can allocate to any of your pods. For example, you can limit the maximum memory limit to be no more than 8 GB. This is useful in particular when you know that your current nodes cannot allocate more than 8 GB per container. Read the VPA’s official wiki page for detailed spec and design.

VPA has also an interesting feature called the VPA Recommender. It watches the historic resources usage and OOM events of all pods to suggest new values of the “request” resources spec. The Recommender generally uses some smart algorithm to calculate memory and cpu values based on historic metrics. It also provides an API that takes the pod descriptor and provides suggested resources requests.

It worth mentioning that VPA Recommender doesn’t work on setting up the “limit” of resources. This can cause pods to monopolize resources inside your nodes. I suggest you set a “limit” value at the namespace level to avoid crazy consumption of memory or CPU

High-level VPAworkflow

High-level VPAworkflow diagram
  1. VPA continuously checks metrics values you configured during setup AT A DEFAULT 10 SEC intervals
  2. VPA attempts to change the allocated memory and/or CPU If the threshold is met
  3. VPA mainly updates the resources inside the deployment or replication controller specs
  4. When pods are restarted the new resources all applied to the created instances.

A few points to consider as you rollout the VPA:

Cluster Autoscaler

Cluster Autoscaler (CA) scales your cluster nodes based on pending pods. It periodically checks whether there are any pending pods and increases the size of the cluster if more resources are needed and if the scaled up cluster is still within the user-provided constraints. CA interfaces with the cloud provider to request more nodes or deallocate idle nodes. It works with GCP, AWS and Azure. Version 1.0 (GA) was released with Kubernetes 1.8.

High-level CAworkflow

High-level CAworkflow diagram
  1. The CA checks for pods in pending state at a default interval of 10 seconds.
  2. When If there is one or more pods in pending state because of there are not enough available resources on the cluster to allocate on the cluster them, then it attempts to provision one or more additional nodes.
  3. When the node is granted by the cloud provider, the node is joined to the cluster and becomes ready to serve pods.
  4. Kubernetes scheduler allocates the pending pods to the new node. If some pods are still in pending state, the process is repeated and more nodes are added to the cluster.

Consider these as you roll-out the CA

How Kubernetes Autoscalers InteractTogether

If you would like to reach nirvana autoscaling your Kubernetes cluster, you will need to use pod layer autoscalers with the CA. The way they work with each other is relatively simple as show in below illustration.

Kubernetes Autoscalers workflow
  1. HPA or VPA update pod replicas or resources allocated to an existing pod.
  2. If no enough nodes to run pods post scalability event, CA picks up the fact that some or all of the scaled pods in pending state.
  3. CA allocates new nodes
  4. Pods are scheduled on the provisioned nodes.

Common Mistakes

I’ve seen in different forums, such as Kubernetes slack channels and StackOverflow questions, common issues due to some facts that many DevOps miss while getting their feet wet with autoscalers.

HPA and VPA depend on metrics and some historic data. If you don’t have enough resources allocated, your pods will be OOM killed and never get a chance to generate metrics. Your scale may never take place in this case.

Scaling up is the mostly a time sensitive operation. You want your pods and cluster to scale fairly quickly before your users experience any disruption or crashes in your application. You should consider the average time it can take your pods and cluster to scale up.

Best case scenario — 4 minutes

  1. 30 seconds — Target metrics values updated: 30–60 seconds
  2. 30 seconds — HPA checks on metrics values: 30 seconds ->
  3. < 2 seconds — pods created and goes into pending state — 1 second
  4. < 2 seconds — CA sees the pending pods and fires up the calls to provision nodes — 1 second
  5. 3 minutes — Cloud provider provision the nodes & K8 waits for them till they are ready: up to 10 minutes (depends on multiple factors)

(Reasonable) Worst case scenario — 12 minutes

  1. 60 seconds — Target metrics values updated
  2. 30 seconds — HPA checks on metrics values
  3. < 2 seconds — pods created and goes into pending state
  4. < 2 seconds — CA sees the pending pods and fires up the calls to provision nodes
  5. 10 minutes — Cloud provider provision the nodes & K8 waits for them till they are ready minutes (depends on multiple factors, such provider latency, OS latency, boot strapping tools, etc. )

Do not confuse cloud provider scalability mechanisms with the CA. CA works from within your cluster while cloud provider’s scalability mechanism (such as ASGs inside AWS) work based on nodes allocation. It is not aware of what’s taking place with your pods or application. Using them together will render your cluster unstable and hard to predict behavior.

TL;DR