A complete storage guide for your Kubernetes storage problems

Posted on April 28, 2020

CNCF projects highlighted in this post

Guest Post by Chad Serino, CEO, AlphaBravo

With the rise of Kubernetes as a method for hosting microservice-based processes, data storage is always a concern. Where it’s being stored. How much capacity we have for it. And how we plan on retrieving it. And the answer to many of these questions seems to be two simple words: persistent storage.

Containers have emerged as a way to port software to wherever it needs to be. Containers with the data needed to run the service deploy to a variety of computer systems, meaning data is now much more portable than before.

But what is persistent storage when it comes to Kubernetes? How can data managers make the best of their Kubernetes systems? And what are the overall benefits of a system like this?

Join us, today, and get ready to find out!

Kubernetes Explained: A Quick Recap

If you manage literally any part of your day-to-day infrastructure, you’ve likely heard of Kubernetes. This evolving tech has risen through the ranks of tech giants and startups alike to enable agile, scalable application work. But what is Kubernetes? A management system for containerized applications, that functions across clusters of nodes.

Not simple enough? Alright, then picture this: you’ve got a group of machines. Virtual machines, or VMs, for instance. Then, across these machines, you have a variety of containerized applications. In containerizing an application, we package it, its libraries, its frameworks, and configuration files to run on any system. You take the application you’ve made, put it in a “capsule” and design that capsule so that it can lend anywhere and run its app.

So you’ve got however many applications running, encapsulated, on a network of computers in some form. Kubernetes helps you to easily manage these apps in a way that is sustainable and efficient. Users can create and access databases, as well as application data for various applications, with ease via Kubernetes. This increases speed but, more importantly, it also improves efficiency.

The Kubernetes Storage Class lets administrators assign “classes” of storage-to-map service quality levels. They can also add backup policies as well as arbitrary policies assigned by cluster administrators.

Kubernetes and Persistent Storage

So we’ve established that, for our purposes at least, containers are never to be questioned. A container could shut down today, somewhere on your system, and all of the application data created it has ever stored will be lost.

For some applications, this isn’t all that big of a deal. In a lot of others, however, it’s not. Applications often need to remain in their unaltered state in order to function. Not only that, but they can’t share information with other applications in their wheelhouse if all of their localized data is deleted. In short, the container is king, and you should be very worried about what might happen to your system if your containers were ever compromised.

Let’s turn to an example. If an application relies on database content, it is data-centric. Lose that data, even in part, and your entire app becomes compromised. The safe bet, in these cases, is to store your data somewhere outside of the container, where it can be accessed without ever being a danger to anybody. This is where the idea of “persistent” information comes in – data that last because they’re not associated with volatile containers.

A Closer Look

So, what do we need? Glad you asked:

a container
a place to persistently store information
data to put in that container, in that location

Combine all of these factors, and you have access to data, at any time of the day or night, even after a given number of containers shut down. This makes it extremely valuable to Kubernetes.

Kubernetes creates permanent storage mechanisms for containers, based on Kubernetes persistent volumes (PV). This refers to any resource applying to the entire cluster which allows users to access data far beyond their pod’s total lifespan.

Kubernetes Volumes, meanwhile, allow users to mount storage units in order to expand how much data they can share between nodes. File system folders, cloud storage buckets – this is any data of your choosing, ready for you to access it. Regular volumes will still be deleted if and when the pod hosting that particular volume is shut down. The permanent volume, however, is hosted on its own pod, safe, and sound.

What’s important, here, is that the PV is not actually backed by locally-attached storage. Nothing on any worker node. Instead, it’s supported by networked storage systems, including EBS and NFS. You may also find it on a distributed filesystem, such as Ceph.

The Significance

When it comes to storage implementation, there’s always been a push to keep storage implementation details “on the down-low”, as it were. Secret. Need-to-know. Kubernetes Persistent Storage hides a lot from other applications and the users who use them. This pertains, mostly, to storage implementation details, but also includes user information.

A few years back, these protocols included things like:

NFS
iSCSI
SMB

These protocols enabled applications and operating systems to access drives and device vendors. The cloud-native environment came along and broke this interoperability down, somewhat. Now, storage services and systems on cloud providers give more people access to that.

Third-party cloud storage builds environments where users have full access to data they can integrate. Amazon S3 storage, by way of an example, offers a selection of tools and applications, all ready for you to use and manipulate. LINBIT is a strong solution in the persistent storage industry, as well. It’s benefits like custom integration support that make this platform really shine.

The Persistent Volume abstraction has come a long way. It now allows cloud-native applications to connect with other cloud storage systems, as well. Add virtualized storage and open source storage platforms, and you’ve got a storage option that actually gives value back. Without having to ever explicitly integrate your system with those systems, Kubernetes is paving the way to the future. Now applications can request storage, as and when needed, creating provisions without knowing what is being stored, or how.

Cloud storage consumption, overall, provides a much smoother experience than trying to do it on-site in any way. It also helps to eliminate a lot of your overheads. Without as much vendor lockout to be considered about, there’s never been a better time than now to migrate over. You might even find you have the resources and motivation to adopt a multi-cloud strategy and expand your storage reach.

With Kubernetes, storage is king. It’s a useful tool for admins, obviously, enabling data retention in persistent forms. And there’s a lot to be said about the benefits of persistent data, in this regard.

“How Can I Manage My Kubernetes Storage More Easily?”

Obviously, when it comes to any data system, the first and most important thing (after storing your data) is managing your data. And, with so much data to store in a Kubernetes system, it’s important to know how to do it efficiently and to a high level of quality.

When specifying pods, for instance, did you know that you can specify how much CPU and RAM power, respectively, each container will require? When resource requests are specified within the container, itself, the pod makes a call on which is the better choice of node on which to place the pod. And, in the case where you’re dealing with a bounded container, you might pick any one of a few ways to vie for the node’s resources.

What’s important isn’t finding any sort of objectively-better method, because the truth is there are so many out there to choose from. It comes down to personal preference and trial by fire. You need a system that works for your exact needs.

Maybe something with:

open-source design
scale-out persistent storage
in-kernel data replication
lightning-fast response times
low CPU requirements

Whatever your specifics may be, the aim is to invest in a system that uses persistent memory with low latency to help you keep your operations safe.

Persistent Kubernetes Storage

Kubernetes is a container orchestration tool that has become the standard for how businesses store and use data pods. To call it a “revolution” in the way business apps are being used is sort of underselling it and definitely missing the point. This is the next step in the way we use, access and store our application data. It’s the “evolution”, more than anything else.

As microservices architectures evolve, expect to see app logic and infrastructure walled off. Developers use what they use, allowing them to focus on the job at hand. Abstracting the actual machine you are managing. With Kubernetes, you can describe your desired amount of memory and computing power, then set the system to use it without.

Yokohama, Japan