Building a Kubernetes platform: how and why to apply governance and policy

Posted on May 29, 2023 by Andy Suderman

CNCF projects highlighted in this post

Guest post originally published on Fairwinds’s blog by Andy Suderman

A Platform, sometimes called an “internal developer platform,” is a unified infrastructure that allows development teams in a company to deliver applications rapidly and consistently. Out of the box, Kubernetes is a very powerful platform, but it’s too complicated and feature-rich to put in front of development teams as an internal developer platform without having some guardrails in place.

Kubernetes is a perfect foundation for building a platform, however. It offers platform engineers many tools that allow them to provide developers with a more streamlined and safe approach to running applications. So, how do you build a platform that offers a great developer experience, but do that without getting in the way? In this post, learn how to prevent bad things from happening in your cluster by applying guardrails, as well as how to define RBAC policies for namespaces, users, and default network policies.

Kubernetes Platform Components

Although the philosophy of how you do policy and governance probably applies across all infrastructure, this post focuses on Kubernetes. A Kubernetes platform includes not only Kubernetes, but all of the tooling and processes, as well as the policies and governance you put in place as guardrails in Kubernetes to give developers a “happy path” to deploy applications faster.

Add-Ons

Tools that provide default “out of the box” capabilities that extend the functionality of Kubernetes, such as DNS, TLS, Ingress, logging, tracing, and so on.

Governance

A set of policies that define and enforce best practices in the Kube platform, as well as resource management, scheduling, upgrades, and role-based access control.

Deployment

A “happy path” for deploying new applications into the platform faster and more easily.

Feedback

Detection and notification of issues, as well as suggested remediation, provided to developers in code review.

Governance and Policy: A Three Phased Approach

When talking about governance and policy in Kubernetes, think of it as a journey. It starts with identifying the policies you need, then remediation of policy violations, and finally blocking those violations from entering your cluster(s). Often, teams deploy Kubernetes and everything seems fine at first. Your developers are happy, they are busy coding and shipping apps and services, and it all seems to be working. Over time, you realize that you’ve missed setting up some things in terms of security and best practices. In your platform, people can deploy what they want, when they want. Unless someone goes in and manually looks through all of the settings, you may have no idea that something is going wrong until you see an alert or something breaks.

Use the Fairwinds Insights free tier to get started with Kubernetes governance so you can take more control over your environment and make it more secure, cost efficient, and reliable. When you create a cluster in the free tier, it will automatically install some reports for you. Go into the Install Hub to see what’s currently running in your installation. It’s set up initially in passive mode, so it truly is a read-only environment to get you started. Polaris, an open source policy engine for Kubernetes, is installed by default, as is the Open Policy Agent.

Identify

Once you’re in Insights, do you know what policies you need to write? What violations are you looking for? There are a lot of blog posts and articles about best practices for how to secure your cluster and what policies you need to put in place, but it can be challenging to gather and sift through all that information. Insights includes a lot of policies that are already defined for you to get you started. If you look at just the Polaris checks, you can see there are 34 policies. These checks were created based on experience working with a lot of different clusters and clients, as well as best practices in the NSA Kubernetes hardening guide and other industry standards.

Polaris provides workload configuration validation and best practices, and can tell you if a workload is configured in adherence with a policy that you want, including whether you have labels set or requests and limits set. You can have cluster wide policies that apply across your entire cluster, and there are also scoped policies that apply specifically to specific workloads. Insights also includes OPA policies, which are written in Rego, and policy templates you can use to create your own policies. In addition, Insights can apply these policies across multiple clusters in a consistent way.

Remediate

Once you’ve identified the policies you need, it’s time to move on to the remediate step — fixing the things that are broken in your cluster. In Insights, those will appear under Action Items.

Screenshot of action items: by severity, top issues, top namespaces, top workloads

This provides a great overview of everything that’s happening in your cluster, but you may want to choose one or two things to focus on to begin with. The view allows you to filter, so you could start with all critical action items, or you could filter for everything security-related and start there.

Insights also provides automation rules; these rules allow you to automate certain tasks and actions. For example, you could say that if an Action Item is identified and it has a severity that’s less than or equal to 0.25, change the description to low risk. Or if an Action Item comes in for a specific namespace, then you can have Insights automatically assign it to someone via a Jira ticket or GitHub issue. You could also automatically flag high severity items for follow up, but only when they are in production namespaces. The automation rules give you a lot of flexibility and help you manage your Action Items more effectively.

Between automation rules, Jira and Slack integrations, and CI/CD integration, you should be able to start whittling down the list of policy violations. Focus on one policy at a time and remediate all the action items for that policy. Once that is done, you can move onto the next step — blocking.

Blocking

The final step is blocking policy violations. This can be done in two places, the CI/CD integration, or the admission controller.

To integrate Insights with your GitHub , just click on the Repositories tab and add a repository. There are two ways to integrate this into your CI workflow:

Connect to GitHub. That sets up Auto-Scan. It crawls your repo and detects your manifests, Helm charts, and Docker images. Then it scans them and brings Action Items in your GitHub, showing you things that are failing Insights policies. You can fix those when you are doing your work in Git. In GitHub, you can also control whether a check is required or not and you can choose whether to block something that fails a specific policy.
Connect manually. This is a multi-step process. There’s a YAML file that you need to put at the root of the repo that you’re going to enable this for. Then you need to add a Fairwinds Insights token as a variable to your CI/CD platform. That allows you to set up which files and directories in that repo you want to scan, and any exemptions for things that you don’t want scanned. You have to set the exit code to false so that it doesn’t fail at that, if you’re doing it through CircleCI or similar. You can see with the CI and the admission controller what the default settings are for these things and whether they’ll block deployment or not.

Once you have your CI/CD integration set up, you can either fail the build, or just warn about violations by setting up branch protections. If you’re in the Blocking phase of a policy, it’s probably best to block the deployment entirely.

For an added layer of assurance and security and for workloads that may not be deployed by a CI/CD pipeline, you can enable the Admission Controller. This will block any incoming Kubernetes objects that might violate a policy.

Start Applying Kubernetes Governance & Policy

Often, when you tell people to use policy to enforce Kubernetes best practices around cost efficiency, security, and reliability, they’re not sure where to start or what to focus on. The best way to get started is by identifying what’s important to you, figuring out where you have issues in your cluster, and begin remediating those issues piece by piece. Once you have those issues fixed in your critical categories, you can start enforcing those policies at admission time — and you never see those Action Items again! Watch the next Kubernetes Clinic in the series, when we talked about how to deploy in CI/CD.

Watch the Kubernetes Clinic: Building a Kubernetes Platform – Apply Governance & Policy to learn more and walk through some of this information.

Explore Fairwinds Insights for Free

Hyderabad, India