Why Kubernetes was inevitable

Posted on August 2, 2021 by Lars Larsson

CNCF projects highlighted in this post

Guest post originally published on Elastisys’ blog by Lars Larsson

Do you feel that Kubernetes is too complicated? That it’s going to be a waste of time to learn it? I know from experience that you are not alone! Heck, I’ve taught Kubernetes to people, so I am more than aware that there is a learning curve. And it can seem complicated at first, with all the various abstractions and objects that you apparently have to learn. However, if you just go through it all, and look at what Kubernetes will give you, you will see that it’s all very well designed. And that it makes sense. Am I living in a bubble of some kind of blissfully enlightened Kubernetes Nirvana state? Perhaps. But it’s a nice bubble to be in, so grab a cup of coffee, and join me!

Those who do not understand Kubernetes are condemned to reinvent it, poorly

I find that Henry Spencer’s famous quote about UNIX works about Kubernetes, too. Because while I agree that there is a learning curve to getting to know the concepts in Kubernetes, you should not underestimate the learning curve you’re in for if you try to implement all those features yourself in your “simpler” system.

I recently hung out with some friends, reminiscing about the good old (bad?) days of when we used configuration management systems exclusively and ran our applications on bare VMs. Those applications were diligently packaged as Debian packages, installed via Ansible, and that’s how we’d manage them, too. Nagios (the guy who liked Munin was ignored by the rest of us) was used to monitor our systems, and logs of course just wound up on disk. Auditing who had done what in the system was never really a thought that crossed our minds. We all just logged in and did whatever. Of course, all of that worked just fine. Until it didn’t.

And when it didn’t, for whatever reason, we’d have tons of stuff to do to fix it. Did the physical hardware die? Did an OS upgrade mess something up? If you’ve been in this field for more than a minute, you know that there are endless possibilities for things to go wrong. So we’d have to start software somewhere else. Migrate data, either by copying it from one place to another or by actually attaching a hard drive to another (not-dead) machine.

Did we do all of that manually? Yes, most of it! At least once. When we grew tired of doing it manually, we would try to automate using scripts. Endless scripts. Brittle ones, if I have to be honest. They did their job most of the time, though. As fun as it was, it was also tedious. And running software across multiple machines required planning on our part. Dynamically rescheduling if errors occurred? Nope!

Kubernetes gets deployment and orchestration right

Kubernetes is, at its core, a distributed key-value store and a bunch of control loops that automate deployment across multiple machines in a cluster. That’s it. The distributed key-value store is fault-tolerant, in that it can survive the loss of a member as long as the remaining ones are in a majority. The control loops run continuously, and ensure that if an error occurs, and there is an easy fix for it, they just fix it.

Say that a machine dies. Kubernetes notices this within seconds, and deploys the application components elsewhere. It then modifies its own overlay network such that traffic can still flow to the component that was replaced. It also checks that the component has started correctly and will accept traffic before letting traffic hit it.

On top of all that, it also has the ability to scale out your application components across more of the cluster, if it understands that your load requires it. Honestly, doing all that on my own would be a nightmare: I know my limitations. So you can criticize Kubernetes for being complex, but the tasks it solves are complex, too.

That time I cobbled together a crappy Kubernetes-like platform and why it sucked

I kind of cobbled together a crappy Kubernetes once. And yes, it sucked. Let me tell you the story. The year was 2014. Kubernetes was just about to come out, but outside of the group of direct contributors, who knew that? Like the rest of the DevOps world, I was smitten with Docker containers. All dependencies packaged into a single package that I can just ship to production?! Wow!

But running on many machines in a cluster was painful. Docker tried to fix this via the awkward Docker Swarm project that is now on life support or dead. I care so little about it that I am not even gonna check which it is. That’s how dead it is. To me at least. But what was not dead was this beautiful beast called CoreOS. CoreOS was a Linux distribution that was focused on running containers. It featured etcd, the distributed key-value store that is now the brains of modern Kubernetes. Together with a component called “fleet”, it created a distributed init system based on systemd. So much so that how to run containers was defined by writing systemd unit files. This was very flexible, and let you describe dependencies between components easily. And you could store your application configuration in etcd, too! Just push it there, and it could then be read from wherever! Finally, a tool that lets you run clustered containerized applications! But what about configuration changes? Well, I found a tool that Kelsey Hightower developed, called “confd”, that would help you react to when data in etcd changed. So all I had to do was update configuration in etcd, and that would automatically trigger a component restart. I blogged about this system on LinkedIn and how I used it to set up a WordPress installation. Don’t click that link.

So what sucked about it? Well, data management, for certain. I had to set up a networked file system all by myself and make it run in host-accessing containers that could mount the GlusterFS file system via systemd unit files to the CoreOS hosts. Then refer to that particular mount directory in my WordPress systemd unit files. And all components that depended on each other (oh wow, there were several!) had to refer to each other’s systemd units. Did it work? Yes. But it was a nightmare to maintain. And it took me forever to set it up. And maintaining it was a mess, too. I don’t think I ever did anything smart about logging. And auditing? I don’t think I even had that word in my vocabulary back then. I am confident that I could implement the same system in an afternoon based on Kubernetes instead. And remember, I am a man who knows my limitations. So take that entirely as a sign of how good I think Kubernetes is.

How Kubernetes helps deploy and operate applications

What would be different if I used Kubernetes to implement that WordPress setup? Well, for starters, I clearly communicate my intention with Kubernetes by choosing how to run my various components.

That database that WordPress wants to use? That’s something that needs persistent storage, and I would love to depend on it having stable network addresses and hostnames, because it makes it dead simple to cluster them. So that’s going to be run in a StatefulSet with Persistent Volumes attached. If one crashes, it’ll get recreated, but in a way that is imperceptible to the other instances. A caching layer like memcached, which by definition is only there to be an ephemeral storage in memory? That’s fine to deploy as a Deployment, because all I care about is that Kubernetes will re-create it somewhere if it crashes.

I want the application to be exposed to the Internet, so visitors can see it. And because WordPress likes to use the “local” file system, it’ll have to be backed by a networked file system if I have multiple application servers. But I don’t need these to have stable network addresses and such, so a Deployment will be fine, but I have to make sure that the Persistent Volume they all attach is one that allows multiple writers. It needs to have a Service and an Ingress that fronts that Service, too. And the Ingress instructs the Ingress Controller to only listen on HTTPS traffic, and it can do SSL termination for me, too. Oh, and via cert-manager, it’ll just do all the Let’s Encrypt magic for me, too. And so on.As you see, by just knowing which Kubernetes object I need to use, a lot will be taken care of for me. No more custom and bespoke scripts that only I can understand.

Add to all of this that all kinds of great software now knows how to get deployed in Kubernetes via e.g. Helm charts, or even via Kubernetes Operators, it’s so much easier to build an application out of components. Before we had Kubernetes, we had to figure out all kinds of differences between runtime environments ourselves.

Kubernetes was inevitable. Not just because Google wanted to poach AWS cloud customers by standardizing how applications get deployed. But because we were all sick of dealing with the low-level stuff ourselves once we had to start addressing the difficult operational tasks: automatic failover, rolling deployments, rescheduling. And so on.

Mumbai, India

Those who do not understand Kubernetes are condemned to reinvent it, poorly

Kubernetes gets deployment and orchestration right

That time I cobbled together a crappy Kubernetes-like platform and why it sucked

How Kubernetes helps deploy and operate applications