Community post originally published on Neon Mirrors by Chip Zoller

In real life, imposed rules often have cases where exceptions may be required but on a case-by-case basis. Policy is really no different here. While prevention of objectively “bad” behavior should be commonplace and enforced as widely as possible, there are valid situations where the rule may need to be bent slightly. I’ve covered how some of these exceptions work in Kyverno in the past, but I also wanted to explore the possibility of creating some sort of “self-driving” exception system even if just conceptual in nature. In this blog, I’ll share a fun little concept project I concocted on how to use Kyverno to implement a one-time pass code system for allowing these exceptions. It’s probably not highly practical, but it does give you a sense of what’s possible and just how powerful and flexible Kyverno can be to deliver even semi-crazy use cases like this one.

Chances are high you’re using some sort of validation policies in your cluster if you’re reading this article. And chances are also pretty high that at least one of those policies is in Enforce mode which, as you probably know, will prevent a “bad” resource from being created should it violate one or more rules in the policy. There are a couple ways to provide exceptions in Kyverno. One of those is to define an exclude block in a rule and list them there. Another is to define them centrally in another Kubernetes resource like a ConfigMap. And yet another still is to use the formal PolicyException resource introduced in Kyverno 1.9. These are all really useful mechanisms that you should try and employ. But what if in some situations you just wanted to be mostly hands off and provide a bit more loose control? What if you could just let developers and other users know how they can get around policy but still with some form of an access system? I thought I’d play around with that idea a bit and wanted to see if I could do something like a one-time pass code system for Kyverno. It turns out that because of the amazing flexibility and power of Kyverno, not only can this be done but it really wasn’t that difficult!

At the end of the day, the idea is this: provide a unique one-time pass code (OTP) back to a user if their resource is blocked by a validate rule but ensure that code and use of it is documented so it can be audited. And, obviously, to prevent reuse of any code more than once.

With a combination of a couple of different Kyverno policies which use both validation and mutation for existing resources, this is all possible. The full sequence of how I wanted this to work is shown below.

Figure 1: Sequence diagram showing endtoend flow of events.

And heres how to put this together.

First, well need a Namespace Im calling platform in which to put our ConfigMap used as the OTP journal. Obviously, in a case where, for some reason, you wanted to implement this in a real environment, youd absolutely want to protect this with RBAC so users cant read it. This ConfigMap has a key called codes with just some starter codes to give you an idea of the formatting and sample contents.

apiVersion: v1
kind: ConfigMap
metadata:
  name: otp
  namespace: platform
data:
  codes: |-
    - ua8v92pg
    - 9akvm2o7    

Next, we need to create the validation rules. There are two rules going on in this policy.

  1. The invalid-otp rule is universal and not tied to any specific rule or other policy. It simply checks for creation of Deployments which have the otp label set that the code hasnt been consumed. This will come into play later.
  2. The host-namespaces otp rule is just an existing rule from the Pod Security Standards of Kyverno policies which has been slightly modified to look-up codes from the ConfigMap mentioned earlier. You’ll see that the OTP code is actually created in the message field of this rule. This is important because in the next phase, we’ll harvest this information to be the input driver for the ConfigMap.

Also, notice how I’ve used spec.applyRules: One in this policy and ordered the rules such that invalid-otp is first. This is to prevent creation of yet another OTP if a user either specifies an invalid one or a code which has already been consumed. Although OTP codes will be generated automatically any time there is a Deployment which fails the host-namespaces-otp rule, we only want a code to be generated when they aren’t trying to specify one in the first place.

Below is the full validation policy.

apiVersion: kyverno.io/v2beta1
kind: ClusterPolicy
metadata:
  name: disallow-host-namespaces-otp
spec:
  validationFailureAction: Enforce
  background: false
  applyRules: One
  rules:
    - name: invalid-otp
      match:
        any:
        - resources:
            kinds:
              - Deployment
            operations:
              - CREATE
            selector:
              matchLabels:
                otp: "?*"
      context:
      - name: otp
        configMap:
          name: otp
          namespace: platform
      preconditions:
        all:
        - key: "{{ request.object.metadata.labels.otp }}"
          operator: AnyNotIn
          value: "{{ parse_yaml(otp.data.codes) }}"
      validate:
        message: The code {{ request.object.metadata.labels.otp }} is invalid or has already been used.
        deny: {}
    - name: host-namespaces-otp
      match:
        any:
        - resources:
            kinds:
              - Deployment
            operations:
              - CREATE
      context:
      - name: otp
        configMap:
          name: otp
          namespace: platform
      preconditions:
        all:
        - key: "{{ request.object.metadata.labels.otp || '' }}"
          operator: AnyNotIn
          value: "{{ parse_yaml(otp.data.codes) }}"
      validate:
        message: >-
          Sharing the host namespaces is disallowed. The fields spec.hostNetwork,
          spec.hostIPC, and spec.hostPID must be unset or set to `false`. To get around this,
          you may use a one-time pass code "{{ random('[0-9a-z]{8}') }}" assigned as the value of
          a label with key "otp". Use of this code will be recorded along with your username.          
        pattern:
          spec:
            template:
              spec:
                =(hostPID): false
                =(hostIPC): false
                =(hostNetwork): false

The net effect here is if a user tries to create a “bad” Deployment which violates the host-namespaces-otp rule, it’ll block them but return a message containing the OTP code and how to use it. Notice also how I’m warning in the message that, if you use this code, it’ll be recorded for audit purposes.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox
  namespace: default
  labels:
    app: busybox
spec:
  replicas: 1
  selector:
    matchLabels:
      app: busybox
  template:
    metadata:
      labels:
        app: busybox
    spec:
      hostIPC: true
      containers:
      - image: busybox:1.28
        name: busybox
        command: ["sleep", "9999"]
$ kubectl apply -f baddeploy.yaml 
Error from server: error when creating "baddeploy.yaml": admission webhook "validate.kyverno.svc-fail" denied the request: 

resource Deployment/default/busybox was blocked due to the following policies 

disallow-host-namespaces-otp:
  host-namespaces-otp: 'validation error: Sharing the host namespaces is disallowed.
    The fields spec.hostNetwork, spec.hostIPC, and spec.hostPID must be unset or set
    to `false`. To get around this, you may use a one-time pass code "ee4co4k8" assigned
    as the value of a label with key "otp". Use of this code will be recorded along
    with your username. rule host-namespaces-otp failed at path /spec/template/spec/hostIPC/'

Next, we need to implement the ConfigMap management system so that OTP codes are added when they need to be and removed upon first use. This was the fun part. Let me explain how this works.

First, in the add-otp rule, in order to dynamically add the OTP codes to the ConfigMap, we’re parsing them out of the Event Kyverno generates whenever there’s a blocked resource. This Event–just a standard Kubernetes v1 Event–contains the message which contains the OTP we saw earlier. Since Kyverno can match on these Events (you will need to update your resource filter to allow this), we can use that specific Event as the trigger for a mutate-existing rule on our ConfigMap.

Note: if you remove the Event resource filter you will increase the processing load on Kyverno which will, in turn, require more resources.

With this OTP code extracted from the message, we can append it to the ConfigMap.

Second, in the manage-otp rule, we’re watching for the creation of Deployments that set the otp label and, if that value is valid, we’re modifying its entry in the ConfigMap to record the timestamp and also username of the actor who consumed it. This serves a dual purpose in that because this information has been appended, the code itself is invalidated. Much better than simply deleting the code from the list.

Below is the second policy with both rules.

apiVersion: kyverno.io/v2beta1
kind: ClusterPolicy
metadata:
  name: manage-otp-list
spec:
  rules:
  - name: add-otp
    match:
      any:
      - resources:
          kinds:
            - v1/Event
          names:
            - "disallow-host-namespaces-otp.?*"
    preconditions:
      all:
      - key: "{{ request.object.reason }}"
        operator: Equals
        value: PolicyViolation
      - key: "{{ contains(request.object.message, 'one-time pass code') }}"
        operator: Equals
        value: true
    context:
    - name: otp
      variable:
        jmesPath: split(request.object.message,'"') | [1]
    mutate:
      targets:
        - apiVersion: v1
          kind: ConfigMap
          name: otp
          namespace: platform
      patchStrategicMerge:
        data:
          codes: |-
            {{ @ }}
            - {{ otp }}            
  - name: manage-otp
    match:
      any:
      - resources:
          kinds:
            - Deployment
          operations:
            - CREATE
          selector:
            matchLabels:
              otp: "?*"
    context:
    - name: otp
      configMap:
        name: otp
        namespace: platform
    preconditions:
      all:
      - key: "{{ request.object.metadata.labels.otp }}"
        operator: AnyIn
        value: "{{ parse_yaml(otp.data.codes) }}"
    mutate:
      targets:
        - apiVersion: v1
          kind: ConfigMap
          name: otp
          namespace: platform
          context:
          - name: used
            variable:
              jmesPath: replace_all(target.data.codes,'{{request.object.metadata.labels.otp}}','{{request.object.metadata.labels.otp}}-{{time_now_utc()}}-{{request.userInfo.username}}')
      patchStrategicMerge:
        data:
          codes: |-
                        {{ used }}

Try it out with a Deployment which uses the code provided earlier.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox
  namespace: default
  labels:
    app: busybox
    otp: 1t1h360g
spec:
  replicas: 1
  selector:
    matchLabels:
      app: busybox
  template:
    metadata:
      labels:
        app: busybox
    spec:
      hostIPC: true
      containers:
      - image: busybox:1.28
        name: busybox
        command: ["sleep", "9999"]

When a valid code is consumed, Kyverno will update the ConfigMap to transform this

apiVersion: v1
kind: ConfigMap
metadata:
  name: otp
  namespace: platform
data:
  codes: |-
    - ua8v92pg
    - 9akvm2o7
    - 1t1h360g    

into this

apiVersion: v1
kind: ConfigMap
metadata:
  name: otp
  namespace: platform
data:
  codes: |-
    - ua8v92pg
    - 9akvm2o7
    - 1t1h360g-2023-06-21T15:04:59Z-czoller    

Alright, lets try it out endtoend and see this whole thing work!

Create a bad Deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox
  namespace: default
  labels:
    app: busybox
spec:
  replicas: 1
  selector:
    matchLabels:
      app: busybox
  template:
    metadata:
      labels:
        app: busybox
    spec:
      hostIPC: true
      containers:
      - image: busybox:1.28
        name: busybox
        command: ["sleep", "9999"]
$ kubectl apply -f baddeploy.yaml 
Error from server: error when creating "baddeploy.yaml": admission webhook "validate.kyverno.svc-fail" denied the request: 

resource Deployment/default/busybox was blocked due to the following policies 

disallow-host-namespaces-otp:
  host-namespaces-otp: 'validation error: Sharing the host namespaces is disallowed.
    The fields spec.hostNetwork, spec.hostIPC, and spec.hostPID must be unset or set
    to `false`. To get around this, you may use a one-time pass code "uq1s17g8" assigned
    as the value of a label with key "otp". Use of this code will be recorded along
    with your username. rule host-namespaces-otp failed at path /spec/template/spec/hostIPC/'

Lets use the code uq1s17g8 just provided.

Ill take the same bad Deployment and add that as the value of a label called otp.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox
  namespace: default
  labels:
    app: busybox
    otp: uq1s17g8
spec:
  replicas: 1
  selector:
    matchLabels:
      app: busybox
  template:
    metadata:
      labels:
        app: busybox
    spec:
      hostIPC: true
      containers:
      - image: busybox:1.28
        name: busybox
        command: ["sleep", "9999"]
$ kubectl apply -f baddeploy.yaml 
deployment.apps/busybox created

Lets ensure someone cannot use this same code a second time, so well delete the Deployment we just created.

$ kubectl delete deploy busybox
deployment.apps "busybox" deleted

And try to create the same exact Deployment once again.

$ kubectl apply -f baddeploy.yaml 
Error from server: error when creating "baddeploy.yaml": admission webhook "validate.kyverno.svc-fail" denied the request: 

resource Deployment/default/busybox was blocked due to the following policies 

disallow-host-namespaces-otp:
  invalid-otp: The code uq1s17g8 is invalid or has already been used.

There you can see that the same code uq1s17g8 is now flagged as invalid since it was used once before.

As a privileged cluster admin, we can also check our otp ConfigMap and see who and when a code was used.

$ kubectl -n platform get cm otp -o yaml
apiVersion: v1
data:
  codes: |-
    - ua8v92pg
    - 9akvm2o7
    - 1t1h360g-2023-06-21T15:04:59Z-czoller
    - uq1s17g8-2023-06-21T15:10:18Z-jdoe
kind: ConfigMap
metadata:
  annotations:
    policies.kyverno.io/last-applied-patches: |
      manage-otp.manage-otp-list.kyverno.io: replaced /data/codes
  creationTimestamp: "2023-06-20T13:01:27Z"
  name: otp
  namespace: platform
  resourceVersion: "5147565"
  uid: ed2cce4e-6cf4-4309-b2cc-a2c45493ef4e

And there you have it, your very own OTP system for Kyverno which is selfmanaged and allows for auditing.

Even though this concept probably isnt very practical to use in the real world, I had fun just experimenting with the idea to see if it was possible. Who knows, maybe some of you out there can even use this!