Project post originally published on the Linkerd blog by Flynn

This blog post is based on a workshop that I delivered way back in September 2023(!) at Buoyant’s Service Mesh Academy. If this seems interesting, check out the full recording!

Linkerd Certificate Management with Vault

Linkerd’s ability to automatically secure communications using mTLS has always been one of its headline features. Of course, mTLS requires certificates, and managing certificates can be very tricky: you need to generate them, rotate them, and distribute them to all the places that need them… while still being careful to avoid exposing any private keys.

For many of the demos we do, we sidestep all this by letting linkerd install silently generate the certificates we need, then ignoring them beyond that. This is vaguely OK for a demo, but it’s totally unacceptable for the real world. In the real world:

Ultimately, the way to tackle all of these issues is to use an external CA to issue at least the trust anchor certificate. There are several ways to set that up: in this article, we’ll walk through a fairly real-world scenario:

Note that our goal is not to teach you how to use Vault, in particular: it’s to show a practical, relatively low-effort way to actually use external PKI with Linkerd to bootstrap a zero-trust environment in Kubernetes. Many companies have existing external PKI already set up (whether with Vault or something else); being able to make use of it without too much work is a huge win

The Setup

In order to demo all this simply, we’ll be running Kubernetes in a k3d cluster. We’ll run Vault in Docker to make things easy to demo, but we will not be running Docker in Kubernetes: Vault will run as a separate Docker container that happens to be connected to the same Docker network as our k3d cluster.

The big win of this setup is that you can run it completely on a laptop with no external dependencies. If you want to replicate this with a cluster in the cloud, that’s no problem: just figure out a reasonable place outside the cluster to run your Vault instance, and make sure that both your Kubernetes cluster and your local machine have IP connectivity to your Vault instance. Everything else should be pretty much the same.

The way all the pieces fit together here is more complex than normal:

Taken together, this implies that we’ll have to make sure that we can talk to the Vault instance both from inside the Docker network and from our host machine. This mirrors many real-world setups where your Kubernetes cluster is on one network, but you do administration from a different network.

Tools of the trade

You’ll need several CLI tools for this:

Of course you’ll also need Docker. You can get that from https://docs.docker.com/engine/install/, or you can try Colima from https://github.com/abiosoft/colima instead.

Starting our k3d cluster

Creating the k3d cluster looks horrible, but isn’t that bad:

k3d cluster create pki-cluster \
    -p "80:80@loadbalancer" -p "443:443@loadbalancer" \
    --network=pki-network \
    --k3s-arg '--disable=local-storage,traefik,metrics-server@server:*;agents:*'

(If you already have a cluster named pki-cluster, you’ll need to delete it, or change the name above.)

This command looks complex, but it’s actually less terrible than you might think – most of it is just turning off things we don’t need (traefik, local-storage, and metrics-server), and we also expose ports 80 and 443 to our local system to make it easy to try services out.

At this point, you should be able to run things like kubectl get ns or kubectl cluster-info to verify that you can talk to your cluster. If not, you’ll need to figure out what’s wrong and fix it.

Starting Vault

We have a running k3d cluster, so now let’s get Vault going. This is another complex-looking command:

docker run \
       --detach \
       --rm --name vault \
       -p 8200:8200 \
       --network=pki-network \
       --cap-add=IPC_LOCK \
       hashicorp/vault \
       server \
       -dev -dev-listen-address 0.0.0.0:8200 \
       -dev-root-token-id my-token

Breaking this down, we start with docker run since we want to start a container running, and then provide a lot of parameters:

Next is the image name (hashicorp/vault), and then comes the command line for Vault itself:

Once you run that, you’ll have Vault running in a Docker container, hooked up to the same network as the pki-cluster we started a moment ago. (Again, if you already have a container named vault you’ll either need to kill it or change the name above.)

Next up, we’ll want to use the vault CLI on the local host to configure Vault. We’ll start by setting the VAULT_ADDR environment variable, so that we don’t have to include it in every command. Remember, we’ll be running the vault CLI on our local system, so we can just do this all using our local shell.

export VAULT_ADDR=http://0.0.0.0:8200/

At this point you should be able to run vault status to make sure that all is well.

Setting up Vault

While this isn’t a blog about how to operate Vault, we still need to configure Vault to work the way Linkerd needs it to. We’re not going to dive too deep into the details here, but we’ll talk a bit about it as we go.

First up, we’ll authenticate our vault CLI to the Vault server, using the dev-root-token-id that we passed to the server when we started it running.

vault login my-token

Next up, we need to enable the Vault PKI engine, so that we can work with X.509 certificates at all, and configure its maximum allowed expiry time for certificates. Here we’re using 90 days (2160 hours).

vault secrets enable pki
vault secrets tune -max-lease-ttl=2160h pki

After that, we need to tell Vault to enable the URLs that cert-manager expects to use when talking to Vault.

vault write pki/config/urls \
   issuing_certificates="http://127.0.0.1:8200/v1/pki/ca" \
   crl_distribution_points="http://127.0.0.1:8200/v1/pki/crl"

Finally, cert-manager will need to present Vault with a token before Vault will actually do things that cert-manager needs. Vault associates tokens with policies, which are kind of like roles in other systems, so we’ll start by creating a policy that allows us to do anything…

cat <<EOF | vault policy write pki_policy -
path "pki*" {
    capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}
EOF

…and then we’ll get a token for that policy. Later, we’ll feed this token to cert-manager.

VAULT_TOKEN=$(vault write -field=token /auth/token/create \
                          policies="pki_policy" \
                          no_parent=true no_default_policy=true \
                          renewable=true ttl=767h num_uses=0)

Creating the Trust Anchor

After all that, we can tell Vault to actually create our Linkerd trust anchor. Note that:

We tell vault write to only output the certificate, which we save so that we can inspect it. Note that the certificate contains no private information, so this is entirely safe.

CERT=$(vault write -field=certificate pki/root/generate/internal \
      common_name=root.linkerd.cluster.local \
      ttl=2160h key_type=ec)
echo "$CERT" | step certificate inspect -

You should see something like this:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 362108562520865298482690188008268341812601922978 (0x3f6d827011b333be6e509a3b13377282ed25a5a2)
    Signature Algorithm: ECDSA-SHA256
        Issuer: CN=root.linkerd.cluster.local
        Validity
            Not Before: Feb 7 23:09:21 2024 UTC
            Not After : May 7 23:09:51 2024 UTC
        Subject: CN=root.linkerd.cluster.local
        Subject Public Key Info:
            Public Key Algorithm: ECDSA
                Public-Key: (256 bit)
                X:
                    1f:ae:48:c9:29:0f:ce:58:43:9d:9a:1a:ba:a9:71:
                    4f:24:e3:31:fc:df:ae:da:ad:b9:04:b6:40:27:29:
                    e8:3e
                Y:
                    f6:01:86:cf:54:57:b0:d7:84:ee:e5:7d:64:6b:28:
                    21:99:7e:5a:bc:a3:92:17:01:51:0e:05:ba:69:97:
                    4f:57
                Curve: P-256
        X509v3 extensions:
            X509v3 Key Usage: critical
                Certificate Sign, CRL Sign
            X509v3 Basic Constraints: critical
                CA:TRUE
            X509v3 Subject Key Identifier:
                84:9A:D1:67:5F:24:53:22:A2:1D:A5:8A:D9:B1:F9:C8:2D:3F:59:12
            X509v3 Authority Key Identifier:
                keyid:84:9A:D1:67:5F:24:53:22:A2:1D:A5:8A:D9:B1:F9:C8:2D:3F:59:12
            Authority Information Access:
                CA Issuers - URI:http://127.0.0.1:8200/v1/pki/ca
            X509v3 Subject Alternative Name:
                DNS:root.linkerd.cluster.local
            X509v3 CRL Distribution Points:
                Full Name:
                  URI:http://127.0.0.1:8200/v1/pki/crl
    Signature Algorithm: ECDSA-SHA256
         30:44:02:20:46:35:54:a2:48:1e:56:04:7a:26:11:38:95:b3:
         72:e7:b2:08:f8:62:a0:46:3a:cc:5c:dd:ff:66:99:26:4e:84:
         02:20:22:2e:b8:78:7a:47:96:94:b7:db:cc:c7:57:22:d2:c2:
         89:55:bf:42:5e:23:ee:2e:c1:a8:b9:cf:cf:c5:50:a0

Look specifically at the Subject and Issuer, which should both be CN=root.linkerd.cluster.local. Likewise, the X509v3 Subject Key Identifier and X509v3 Authority Key Identifier should have the same key ID.

That’s actually all we need there! Now it’s on to get cert-manager installed.

Installing cert-manager

We’ll start by using Helm to install both cert-manager and trust-manager.

helm repo add --force-update jetstack https://charts.jetstack.io
helm repo update

When we install cert-manager, we’ll have it create the cert-manager namespace, and install the cert-manager CRDs too.

helm install cert-manager jetstack/cert-manager \
             -n cert-manager --create-namespace \
             --set installCRDs=true --wait

trust-manager will be installed in the cert-manager namespace, but we’ll explicitly tell it to use the linkerd namespace as its “trust namespace”. The trust namespace is the single namespace from which trust-manager is allowed to read information, and we’re going to need it to read the Linkerd identity issuer.

We don’t need to create the cert-manager namespace here (it already exists), but we do need to create the linkerd namespace manually so that we can use it as the trust namespace.

kubectl create namespace linkerd
helm install trust-manager jetstack/trust-manager \
             -n cert-manager \
             --set app.trust.namespace=linkerd \
             --wait

At this point, if you run kubectl get pods -n cert-manager, you should see both cert-manager and trust-manager running:

NAME                                      READY   STATUS    RESTARTS   AGE
cert-manager-cainjector-768dc45f6-6zkvn   1/1     Running   0          47s
cert-manager-845bf45b88-g94ls             1/1     Running   0          47s
cert-manager-webhook-7d9dddbf74-tkrht     1/1     Running   0          47s
trust-manager-76fc8cbb64-szwch            1/1     Running   0          25s

Configuring cert-manager: the access-token secret

OK, cert-manager is running! Next step, we need to configure it to produce the certificates we need. This starts with saving the Vault token we got awhile back for cert-manager to use.

kubectl create secret generic \
               my-secret-token \
               --namespace=cert-manager \
               --from-literal=token="$VAULT_TOKEN"

We don’t want to actually look into that secret, but we can describe it to make sure that there’s some data in it, at least. kubectl describe secret -n cert-manager my-secret-token should show a key called token with some data in it:

Name:         my-secret-token
Namespace:    cert-manager
Labels:       <none>
Annotations:  <none>

Type:  Opaque

Data
====
token:  95 bytes

Configuring cert-manager: the Vault issuer

Recall that Linkerd needs two certificates:

We’ve already told Vault to create the trust anchor for us: next up, we need to configure cert-manager to create the identity issuer certificate. To do this, cert-manager will produce a certificate signing request (CSR), which it will then hand to Vault. Vault will use the CSR to produce a signed identity issuer for cert-manager.

To make all this happen, we use a cert-manager ClusterIssuer resource to tell cert-manager how to talk to Vault. This ClusterIssuer needs three critical bits of information:

  1. The access token, which we just saved in a Secret.
  2. The address of the Vault server.
  3. The URL path to use to ask Vault for a new certificate. For Vault, this is pki/root/sign-intermediate.

So the address of the Vault server is the missing bit at the moment: we can’t use 0.0.0.0 as we’ve been doing from our local host, because cert-manager needs to talk to Vault from inside the Docker network. That means we need to figure out the address of the vault container within that network.

Fortunately, that’s not that hard: docker inspect pki-network will show us all the details of everything attached to the pki-network, as JSON, so we can use jq to extract the single bit that we need: the IPv4Address contained in the block that also has a Name of vault:

VAULT_DOCKER_ADDRESS=$(
  docker inspect pki-network \
     | jq -r '.[0].Containers | .[] | select(.Name == "vault") | .IPv4Address' \
     | cut -d/ -f1
  )

Given the right address for Vault, we can assemble the correct YAML:

sed -e "s/%VAULT_DOCKER_ADDRESS%/${VAULT_DOCKER_ADDRESS}/g" \
    <<EOF > /tmp/vault-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: vault-issuer
  namespace: cert-manager
spec:
  vault:
    path: pki/root/sign-intermediate
    server: http://%VAULT_DOCKER_ADDRESS%:8200
    auth:
      tokenSecretRef:
         name: my-secret-token
         key: token
EOF

(If you look at /tmp/vault-issuer.yaml, you’ll see that the server element has the correct IP address in it.) Let’s go ahead and apply that, then check to make sure it’s happy.

kubectl apply -f /tmp/vault-issuer.yaml
kubectl get clusterissuers -o wide

You should see the vault-issuer show with READY true and STATUS “Vault verified”, telling us that cert-manager was able to talk to Vault.

NAME           READY   STATUS           AGE
vault-issuer   True    Vault verified   6s

Now that cert-manager can sign our certificates, let’s go ahead and tell cert-manager how to set things up for Linkerd. First, we’ll use a Certificate resource to tell cert-manager how to use the Vault issuer to issue our identity issuer certificate:

kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: linkerd-identity-issuer
  namespace: linkerd
spec:
  secretName: linkerd-identity-issuer
  duration: 48h
  renewBefore: 25h
  issuerRef:
    name: vault-issuer
    kind: ClusterIssuer
  commonName: identity.linkerd.cluster.local
  dnsNames:
  - identity.linkerd.cluster.local
  isCA: true
  privateKey:
    algorithm: ECDSA
  usages:
  - cert sign
  - crl sign
  - server auth
  - client auth
EOF

NOTE that this Certificate goes in the linkerd namespace, not the cert-manager namespace! This is because Linkerd actually needs access to the identity issuer, so we have cert-manager create it where it will need to be used.

Running kubectl get certificate -n linkerd at this point should show our Certificate with READY true:

NAME                      READY   SECRET                    AGE
linkerd-identity-issuer   True    linkerd-identity-issuer   11s

and if we kubectl describe secret -n linkerd linkerd-identity-issuer we should see a kubernetes.io/tls Secret with keys of ca.crttls.crt, and tls.key:

Name:         linkerd-identity-issuer
Namespace:    linkerd
Labels:       controller.cert-manager.io/fao=true
Annotations:  cert-manager.io/alt-names: identity.linkerd.cluster.local
              cert-manager.io/certificate-name: linkerd-identity-issuer
              cert-manager.io/common-name: identity.linkerd.cluster.local
              cert-manager.io/ip-sans:
              cert-manager.io/issuer-group:
              cert-manager.io/issuer-kind: ClusterIssuer
              cert-manager.io/issuer-name: vault-issuer
              cert-manager.io/uri-sans:

Type:  kubernetes.io/tls

Data
====
ca.crt:   851 bytes
tls.crt:  863 bytes
tls.key:  227 bytes

Finally, we’ll use a Bundle resource to tell trust-manager to copy only the public half of the trust anchor into a ConfigMap for Linkerd to use. Note that Bundles are always cluster-scoped – but also note that the reason we don’t have to specify namespaces for the source and destination is that trust-manager can only read from its trust namespace, in this case linkerd, and it defaults to writing there too.

kubectl apply -f - <<EOF
apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: linkerd-identity-trust-roots
spec:
  sources:
  - secret:
      name: "linkerd-identity-issuer"
      key: "ca.crt"
  target:
    configMap:
      key: "ca-bundle.crt"
EOF

At this point, kubectl get bundle (remember, it’s cluster-scoped!) should show us a Bundle named linkerd-identity-trust-roots with SYNCED true:

NAME                           TARGET   SYNCED   REASON   AGE
linkerd-identity-trust-roots            True     Synced   4s

Installing Linkerd

Finally we’re ready to deploy Linkerd! We may as well use Helm for this, too. Start by setting up Helm repos:

helm repo add --force-update linkerd https://helm.linkerd.io/stable
helm repo update

…then install the Linkerd CRDs.

helm install linkerd-crds -n linkerd linkerd/linkerd-crds

After that we can actually install Linkerd! Pay attention to these --set parameters we pass here:

These things, of course, are being handled by cert-manager and trust-manager.

helm install linkerd-control-plane linkerd/linkerd-control-plane \
     -n linkerd \
     --set identity.issuer.scheme=kubernetes.io/tls \
     --set identity.externalCA=true

Once that’s done, we can use linkerd check to validate that everything worked:

linkerd check

Note that we see a warning for the identity issuer certificate not being valid for at least 60 days. That’s expected, since we created that with a 48-hour lifespan!

Summary

After all that, we have Vault generating all our certificates, cert-manager and trust-manager handling rotating and distributing them as needed, and Linkerd consuming them for mTLS everywhere.

Critically, Vault is not running in our cluster, and if you look back over this whole process, the private key for the trust anchor has never been revealed outside of Vault. Using an external CA to isolate key generation lets us dramatically increase security of the overall system.

Vault, of course, isn’t the only external CA we can use: cert-manager supports a lot of different issuers, including ACME, Vault, Venafi, and many others issuers (see the cert-manager documentation for more about this). We used Vault for this workshop because it’s free to use and relatively easy to set up in Docker, but you’re encouraged to try other kinds of external CAs – ultimately, the critical bit isn’t which one you use, it’s that you’re keeping your secret keys secret.


If you found this interesting, check out the Service Mesh Academy workshop on Linkerd with external CAs using Vault, where you can see the hands-on demo of everything I’ve talked about here! And, as always, feedback is always welcome – you can find me as @flynn on the Linkerd Slack.