Guest post by Zerto

In a previous blog post, we talked about container state, and how the read-only, stateless nature of containers forces application architects to rethink what data is stored where to prevent data loss and configuration issues.

Similarly, data protection needs to be built from the ground up to support this radically different approach.

What’s different?

Protecting the application’s containers and persistent storage is only half the battle. 

Sure, you can back up the container image, its running state and configuration from the cluster and any persistent storage volumes, but that only protects the current state, not the future state.

With new containers being built from new versions of the application continuously, the pipeline that creates those containers is a crucial part of your data protection strategy.

This is because with cloud-native, in-house applications, the build part is as important as the run part. Running the applications takes care of the present, but the future, yet to unlock potential is in the building part. Chances are, equal amounts of, or possibly more people and investment are involved in building the next version of the application compared to running the current version.

Continuous everything

New code goes through automated pipelines: workflows of many steps that test code, build containers and deploy to production automatically. 

We need to make sure to integrate data protection into those pipelines, making sure each new version is protected automatically, with the right protection policies, in a fully self-service manner. That way, we’re not just capturing the end result —the container image—, but also protecting the fully documented processes and workflows for building that end result: the software factory that produces the images, including all necessary configuration scripts (such as Dockerfiles and Kubernetes YAML files) and documentation.

With many different systems in the pipeline, like code repositories, build servers, testing tools and many more, keeping track of all of the relevant configuration for a given application is non-trivial, especially across different environments like testing, acceptance, staging, and production; and that’s not even considering the complexity of multi-cloud, or using multiple cloud availability zones.

That’s why it’s important to adopt ‘everything as code’. Configuration is written ‘as code’: language describing the desired state of cloud resources, application deployment, monitoring and data protection. 

By integrating continuous data protection into the application development and deployment lifecycle, applications are protected not only in production, but also as part of the development lifecycle. Naturally, it’s necessary to protect the systems that create the containers as part of the CI/CD pipeline, something that’s often forgotten. By protecting these workloads, the “factory” that produces container images is kept safe.

For data protection, it means backing up not just the container image itself, but also its deployment configuration from Kubernetes YAML, associated secrets, persistent storage, and build pipeline, like the code repository, build and test automation. These elements may be virtual machines running in a datacenter, warranting protection by existing data protection solutions.

The policies change the way engineers interact with data protection. Instead of having to interact with a separate user interface, data protection becomes a natural, fully self-service part of the application configuration specification in the application deployment pipeline. Data protection is configured by applying a policy to a container build workflow.

These self-service and on-demand aspects of policy-based operations are key benefits of using a data protection as code approach, removing dependencies between development and operations teams.

Want to learn more?

This blog post only scratches the surface, so head on over to this CNCF webinar to learn more.