Project post originally published on the Flux Blog by Daniel Holbach

Next up in our blog series about Flux Security is how we moved to Pod Security Standard “restricted”, all the background info you need to know and how that makes things safer for you.

Since version 0.26 of Flux we are applying

[..] the restricted pod security standard to all controllers. In practice this means:

Flux also enables the Seccomp runtime default across all controllers. Why is this important? Well, the default seccomp profile blocks key system calls that can be used maliciously, for example to break out of the container isolation. The recently disclosed kernel vulnerability CVE-2022-0185 is a good example of that.

Pod Security Standards definition

Kubernetes defined three policies in its Pod Security Standards. They range from

We are very pleased that all Flux controllers were moved to Restricted, as that offers the highest level of security for you.

We recommend checking out the Upstream Kubernetes documentation on Pod Security Standards as it gives a generally good overview of all the security features enabled. In addition to that you can see which restrictions were added as part of which Kubernetes release, meaning that with every Kubernetes release, you will benefit from new Upstream Kubernetes security improvements automatically.

Note: As of v1.24 Kubernetes still runs all workloads with seccomp in unconfined mode, in other words, disabled. On the other hand, Docker has seccomp enabled by default for years now.

There are discussions to change the Kubernetes default on v1.25, and have all workloads set to RuntimeDefault unless opted-out. This would be based on SeccompDefault feature gate being enabled from that version onwards.

Note: If you are an OpenShift user, you might run into this issue ( related upstream report). The work-around right now is to remove the seccomp profile as described in these instructions.

seccomp and RuntimeDefault

Seccomp is short for “Secure Computing”. It refers to a facility in the Linux kernel which can limit the number of system calls available to a given process. Right now there are around 300+ system calls available, e.g. read to read from a file descriptor or chmod to change the permissions of a file. The more syscalls you block, the more secure your application, as a rogue process will only be able to do what you specified.

In its first inception seccomp was introduced into Linux in 2005, to Docker in version 1.10 (Feb 2016) and to Kubernetes in version 1.3 (Jul 2016). So while the technology has been around for a while and you could handcraft your own seccomp profiles, the challenge has always been striking the right balance: if you are too generous in your filter, it won’t guard against malware effectively – if you are too strict, your application might not work.

All container runtimes come with a default seccomp profile. Docker Desktop for example blocks around 44 system calls. In Kubernetes you can enable the seccomp profile RuntimeDefault for your pod like so:

      type: RuntimeDefault

All Flux controllers have this implemented as well now!

By adopting both changes, we further restrict the permissions that Flux requires in order to operate. This, alongside other changes we are working on, translate in a decreased attack surface which may reduce the impact of eventual CVEs that may surface in our code base – or our supply chain.

Further reading

If you would like to understand the concepts in this blog post better, you might want to check out these blog posts (in addition to the docs referred to above):

Talk to us

We love feedback, questions and ideas, so please let us know your personal use-cases today. Ask us if you have any questions and please

See you around!