Monitoring Kubernetes, part 1: the challenges + data sources

Posted on January 9, 2019

CNCF projects highlighted in this post

Originally published here by Sean Porter, CTO of Sensu

Our industry has long been relying on microservice-based architecture to deliver software faster and safer. The advent and ubiquity of microservices naturally paved the way for container technology, empowering us to rethink how we build and deploy our applications. Docker exploded onto the scene in 2013, and, for companies focusing on modernizing their infrastructure and cloud migration, a tool like Docker is critical to shipping applications quickly, at scale.

But, with that speed comes challenges – containers introduce a non-trivial level of complexity when it comes to orchestration. Enter Kubernetes: an open source container-orchestration system for automating deployment, scaling, and management of containerized applications – the Kubernetes control plane isthe command and control for your infrastructure. Originally launched by Google in 2014, Kubernetes is now maintained by the Cloud Native Computing Foundation (which, incidentally, Google helped form in order to place Kubernetes within the CNCF, to make sure it’d stay free and competitive). If you’re using Docker to containerize your applications, then you’re most certainly using Kubernetes for orchestration. (There are certainly other orchestrators, such as Docker Swarm and Apache Mesos, but Kubernetes has emerged as the leader in container orchestration.)

In the first part of this series, I’ll cover the challenges and the main data sources for monitoring Kubernetes. Later on, I’ll dive deeper into monitoring Kubernetes and Docker deployments, with real-world examples drawing on the data sources outlined below.

Kubernetes monitoring: the challenges

Kubernetes makes it a lot easier for teams to manage containers – scheduling and provisioning them while maintaining a desired state, automatically. A core value prop is it serves as a common platform – Kubernetes can deploy your applications wherever they run, whether that’s AWS, GCP, Azure, or bare metal. Again, with all that power and automation come challenges, especially when it comes to keeping an eye on performance. No matter the size of your deployment, you still need to know how many available resources you have in that deployment, as well as knowing the health of your deployed applications and containers. Just as microservices led us to rethink how we build our applications, Kubernetes requires we change our traditional approach to monitoring – the dynamic nature of container orchestration insists on a subsequently dynamic approach to monitoring.

Here are the challenges as I see them:

In this new dynamic era, your applications are constantly moving.
Before Kubernetes, it was non trivial to have applications distributed across multiple clouds (public and private, as well as different cloud providers). Now that it’s easy to distribute applications, we have a new set of problems.
Much like the move from monolith to microservice architecture, adopting Kubernetes means there are many, smaller pieces to monitor.
You’ve heard about the merits of treating your infrastructure like cattle as opposed to pets. Kubernetes is the epitome of this livestock approach, making it easy to implement high volume and ephemeral infrastructure; just so, keeping track of of your Kubernetes pods and their containers via identifiers such as labels and annotations becomes mission critical.

Kubernetes monitoring: the data sources

Essentially, monitoring tools are collecting Kubernetes data from four sources:

The Kubernetes hosts running the Kubelet. The Kubernetes hosts has limited resources, so it’s especially critical to monitor them. There are a number of ways to get data out of those hosts, but most commonly is to use the Prometheus node exporter to scrape data from the Kubernetes host and expose system resource telemetry data on an HTTP endpoint (such a CPU usage and memory).
The Kubernetes process, AKA Kubelet metrics, which includes metrics for apiserver, kube-scheduler, and kube-controller-manager. These give you details on a Kubernetes node and the jobs it’s running.
The Kubelet’s built-in cAdvisor. There’s a great summary here, but essentially the Kubelet ships with built-in support for cAdvisor, which collects, aggregates, processes, and exports metrics for your running containers. cAdvisor (which also has native support for Docker containers) gives you per-container usage, keeping track of resource isolation parameters and historical resource usage. Because Kubernetes is the control plane, it can designate how much memory is being used, and leverages cAdvisor to keep track.
kube-state-metrics, which gives you information at the cluster level – a big picture view of what’s happening on your Kubernetes cluster, such as all the pods you have configured and their current state. kube-state-metrics hits all Kubernetes services and collects information on their current state, such as how many containers are running, how many are in a particular state, whether any are indicating that they’re unhealthy or that we’re at capacity, etc. From the README, kube-state-metrics “listens to the Kubernetes API server.”

Next up: container states and collecting data with Prometheus

If you’re keeping track at home, you may noticed you can monitor all four of these data sources with Prometheus. You may also have noticed that we’re only talking about monitoring Kubernetes but not the applications running on it (and, this may line up with everything you’ve heard about Kubernetes monitoring). In my next post, I’ll illustrate Kubernetes and Docker monitoring with Prometheus, discuss why it fits well within the Kubernetes ecosystem, and identify the gaps.

Hyderabad, India

Kubernetes monitoring: the challenges

Kubernetes monitoring: the data sources

Next up: container states and collecting data with Prometheus