Data on Kubernetes is a growing field, with databases, object stores, and other stateful applications moving to the platform. The Data Protection Working Group focuses on data availability and preservation for Kubernetes – including backup, restore, remote replication, and ways to facilitate and orchestrate these processes. At the Data Protection Working Group Deep Dive session at Kubecon + CloudNativeCon London (April 2, 2025, at 4:15 p.m.) Xing Yang, Cloud Native Storage Tech Lead at Broadcom/VMware, and I will cover topics including:
· The need for Kubernetes data protection
· Changed Block Tracking updates
· Volume Group Snapshots updates
· Our in-progress white paper, “Best Practices to Prepare Kubernetes Applications for Data Protection”
· The structure of the Data Protection Working Group
· How to get involved with the Data Protection Working Group
We’ll discuss these topics in a bit more depth below!
The Need for Kubernetes Data Protection
Kubernetes has evolved from its original mission of being an orchestrator for stateless containers that use external services for data storage to a platform that supports data storage and state within Kubernetes clusters. State can be stored in Persistent Volumes (PVs) but also in Kubernetes resources as native Kubernetes applications take advantage of the Kubernetes API server to store their working information. The evolution of Kubernetes into a stateful platform has created the need to protect data stored in Kubernetes against loss, corruption, and other threats such as ransomware attacks. The Data Protection Working Group published a white paper that outlines when you need data protection in Kubernetes. We will cover the high points during our session at KubeCon, but we invite you to read the whole paper here: https://github.com/kubernetes/community/blob/master/wg-data-protection/data-protection-workflows-white-paper.md
Changed Block Tracking Updates
The Data Protection Working Group has been working on adding Changed Block Tracking (CBT) to Kubernetes and the Container Storage Interface (CSI). CBT improves the performance of backup and replication of large volumes by tracking the blocks that have been changed between two snapshots. At KubeCon, we’ll give an update on CBT, including what it does and what the status of its implementations are. Veeam Kasten has been a leader in using proprietary CBT systems for Kubernetes data protection and we are proud to have been a participant in creating the Kubernetes Changed Block Tracking API, which is currently in beta with Kubernetes 1.32.
Changed Block Tracking KEP: https://github.com/kubernetes/enhancements/issues/3314
Volume Group Snapshots Updates
Volume Group Snapshots are another Kubernetes enhancement that supports data protection. When an application is using multiple volumes, taking a consistent snapshot of all volumes is important. However, taking snapshots one by one while the application is running may result in inconsistencies between volumes and create a backup that will not be usable.
Volume Group Snapshots KEP: https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/3476-volume-group-snapshot
Best Practices to Prepare Kubernetes Applications for Data Protection White Paper
Kubernetes now has many of the required building blocks for data protection, including volume snapshots and CBT. Many applications can be backed up and restored without any changes, but as applications become more sophisticated and integrated with Kubernetes, it’s important to ensure they can survive disaster. Determining the right data protection strategy to meet your recovery objectives and budget is an important step in this journey. After your strategy is defined, you need to understand how applications interact with your chosen data protection strategy and if any changes in the application are required to support your data protection strategy. We have begun work on a new white paper, “Best Practices to Prepare Kubernetes Applications for Data Protection”, to cover what an application developer or Kubernetes administrator needs to know to prepare their applications for data protection.
During our KubeCon session, we’ll discuss topics such as:
· Crash consistency
· Application consistency
· Restore issues
· RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
· Designing operators for backup and restore
· Remote replication considerations
· Threat models
· Best practices for applications
· Best practices for operators
This is an ongoing project, and we invite everyone to join us and share their needs, experiences, and ideas for how best to have applications be ready for data protection. Please contact us on the Data Protection Working Group Slack channel if you are interested.
The Structure of the Data Protection Working Group
The Data Protection Working Group consists of participants who use Kubernetes and who create applications, storage, and data protection solutions for the platform. We’re open to anyone who is interested in protecting their data on Kubernetes. Come join our session to exchange ideas, learn new things, and find out how to contribute and let us know what your needs are!
Click here to learn more about the Kubernetes Data Protection Working Group.
Come See Us at Kubecon!
Stop by Veeam’s booth S181 for in-person demonstrations of Veeam Kasten, our data protection product, and to talk to our subject matter experts.
Kanister
Kanister is an open source framework for data protection and management on Kubernetes. It is a CNCF Sandbox project and can be found at: https://www.kanister.io/