CNCF hosts a Kubernetes cluster to run some services for internal purposes (namely; codimd, GUAC, kcp).

The Kubernetes Project announced the ingress-nginx retirement, which also affects the above mentioned Cluster. So we started looking into alternatives.

After some discussions, we decided to continue with gateway-api and its implementation as Envoy Gateway.

Envoy Gateway is an CNCF open source project for managing Envoy Proxy as a standalone or Kubernetes-based application gateway. Gateway API resources are used to dynamically provision and configure the managed Envoy Proxies.

gateway api and ingress-nginx architectures

ingress-nginx works with one LoadBalancer service; the ingress controller receives all traffic and distributes it based on the Ingress object configuration.

Flow chart of the external load balancer working with the ingress controller to receive all traffic and distribute it based on the Ingress object configuration.


On the other hand, gateway api is designed in multiple layers:

A flow chart of the gateway API design

Based on this design, it’s possible to create a Gateway object per HTTPRoute and/or TLSRoute. (Each Gateway creates a LoadBalancer type service on the cluster)

Configuration for the services cluster

It’s possible to configure a shared Gateway object and configure it on multiple HTTPRoutes. This is the closest configuration to the current ingress-nginx deployment with some advantages like:

This folder contains all the settings we implemented:

  1. GatewayClass to use Envoy Gateway
  1. A shared Gateway to serve for Guac, codimd, and kcp.
  2. EnvoyProxy to configure HPA, service type, and other proxy settings.
  3. ReferenceGrants to allow the Gateway to access SSL certificates across namespaces
  4. HTTPRoutes for each service
  5. BackendTLSPolict to handle existing nginx annotations for backend HTTPS connections

How we migrated

We had two options:

  1. Add Envoy Gateway with another public IP address and configure DNS to perform round-robin between ingress-nginx and Envoy
  2. Configure Envoy Gateway to use the current IP address and move the whole traffic in one go.

Although the first option is safer, we chose the second for the simplicity of our operation.

The reserved IP address was pushed to the repo as part of EnvoyProxy configuration:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: ha-envoy-proxy
  namespace: envoy-gateway
spec:
  provider:
    type: Kubernetes
    kubernetes:
      envoyService:
        externalTrafficPolicy: Cluster
        type: LoadBalancer
        patch:
          type: StrategicMerge
          value:
            spec:
              loadBalancerIP: "146.235.214.235" # Reserved IP address on the cloud provider
              ports:
              - name: https-443
                port: 443
                targetPort: 10443
                protocol: TCP
                nodePort: 32050 # Fixed NodePort for external LB backend and firewall configuration
...

Critical: externalTrafficPolicy Setting

We initially encountered connection failures due to externalTrafficPolicy: Local (the default). This setting causes the NodePort to only listen on nodes that have an Envoy pod running. When the Oracle Cloud Load Balancer performed health checks on nodes without pods, they failed, marking all backends as unhealthy.

What about certificates?

We chose to use the existing certificates triggered by ingress-nginx via annotations:

---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
...
spec:
  gatewayClassName: envoy
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    hostname: "*.cncf.io"
    tls:
      mode: Terminate
      certificateRefs:
      - name: guac-tls
        namespace: guac
        kind: Secret
        group: ""
      - name: auth-dex-tls
        namespace: auth
        kind: Secret
        group: ""
...

However, the certificates have an owner reference to the Ingress object. This means deleting an Ingress would cascade delete the Certificate and its Secret.

Below one-liner, removes the ownerReference from all Certificates that reference an Ingress:

kubectl get certificate -A -o json | jq -r '.items[] | select(.metadata.ownerReferences[]? | .kind == "Ingress") | "\(.metadata.namespace) \(.metadata.name)"' | while read NS NAME 
do 
    kubectl patch certificate $NAME -n $NS --type=json \
      -p='[{"op": "remove", "path": "/metadata/ownerReferences"}]' 
done

Cross-namespace certificate access

Since certificates are stored in different namespaces than the Gateway, we configured ReferenceGrant resources to allow cross-namespace access:

apiVersion: gateway.networking.k8s.io/v1beta1
kind: ReferenceGrant
metadata:
  name: allow-gateway-to-certs
  namespace: codimd
spec:
  from:
  - group: gateway.networking.k8s.io
    kind: Gateway
    namespace: envoy-gateway
  to:
  - group: ""
    kind: Secret
    name: codimd-tls

This pattern was repeated for each namespace containing certificates.

HTTPRoutes

ingress2gateway helped to prepare the HTTPRoute objects from existing Ingress resources.

We had a special case for one ingress with backend HTTPS configuration:

nginx.ingress.kubernetes.io/backend-protocol: HTTPS
nginx.ingress.kubernetes.io/proxy-ssl-name: api.services.cncf.io
nginx.ingress.kubernetes.io/proxy-ssl-secret: kdp/kcp-ca
nginx.ingress.kubernetes.io/proxy-ssl-verify: "on"

To achieve the same behavior with Envoy Gateway, we created a BackendTLSPolicy:

apiVersion: gateway.networking.k8s.io/v1
kind: BackendTLSPolicy
metadata:
  name: kdp-backend-tls
  namespace: kdp
spec:
  targetRefs:
  - group: ''
    kind: Service
    name: kcp-front-proxy
  validation:
    caCertificateRefs:
    - name: kcp-ca
      group: ''
      kind: Secret
    hostname: api.services.cncf.io


Troubleshooting

TLS handshake failures

If you encounter SSL_ERROR_SYSCALL errors during TLS handshake:

  1. Check Gateway listener: Ensure the HTTPS listener is configured on port 443
  2. Verify certificates are loaded: Check that all referenced certificates exist and are accessible
  3. Check ReferenceGrants: Ensure cross-namespace certificate access is allowed
  4. Review Envoy logs:
kubectl logs -n envoy-gateway-system -l gateway.envoyproxy.io/owning-gateway-name=shared-gateway

Load balancer health check failures

If the cloud load balancer shows backends as unhealthy:

  1. Verify externalTrafficPolicy: Should be Cluster, not Local
  2. Check NodePort accessibility: Test from a node that the NodePort responds
  3. Review health check configuration: Ensure the LB health check matches the service configuration
  4. Check firewall rules: Verify security groups/NSGs allow traffic from LB subnet to NodePort

Certificate not being served

If OpenSSL can’t retrieve a certificate:

echo | openssl s_client -connect <lb-ip>:443 -servername <hostname> 2>/dev/null | openssl x509 -noout -text

This indicates the certificate isn’t loaded. Check:

  1. Certificate is referenced in Gateway certificateRefs
  2. ReferenceGrant exists for cross-namespace access
  3. Gateway status shows Programmed: True

Day 2 operation on certificates

We had decided to move the certificates later, to narrow the scope of the migration and easily use the current certificates at the time. However, when they expire, we could be in trouble. Here is what you need to do make sure that your certificates are managed by Gateway API + cert-manager:

1. Make sure that cert-manager supports Gateway API:

You need to enable Gateway API support on cert-manager:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: cert-manager
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://charts.jetstack.io
    targetRevision: v1.17.2
    chart: cert-manager
    helm:
      values: |
        config:
          enableGatewayAPI: true ## Make sure this exists!

2. Update the ClusterIssuer:

Either update the current issuer or create a new one:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    preferredChain: ""
    privateKeySecretRef:
      name: letsencrypt-prod
    server: https://acme-v02.api.letsencrypt.org/directory
    solvers:
    - http01:
        gatewayHTTPRoute:
          parentRefs:
          - group: gateway.networking.k8s.io
            kind: Gateway
            name: shared-gateway       ## this is the name of your gateway
            namespace: envoy-gateway   ## where your gateway resides

3. Annotate the Gateway for cert-manager

You need to add the annotation, just like we do for ingress-nginx:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: shared-gateway
  namespace: envoy-gateway
  annotations:
    # needs to match with the ClusterIssuer you created/updated on previous step
    cert-manager.io/cluster-issuer: letsencrypt-prod  
spec:
  gatewayClassName: envoy

4. Separate the listeners

We initially had one listener for all our hosts, but they need to be separated (unless you use DNS solver for a wildcard certificate).

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: shared-gateway
  namespace: envoy-gateway
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  gatewayClassName: envoy
  addresses:
   - type: IPAddress
     value: 146.235.214.235
  listeners:
  - name: https-guac
    protocol: HTTPS
    port: 443
    hostname: guac.cncf.io
    tls:
      mode: Terminate
      certificateRefs:
      - name: guac-tls-gw
        kind: Secret
        group: ""
    allowedRoutes:
      namespaces:
        from: All

    # added for cert-manager HTTP01 solver
  - name: http-guac
    protocol: HTTP
    port: 80
    hostname: guac.cncf.io
    allowedRoutes:
      namespaces:
        from: All

  - name: http-api-guac
    protocol: HTTP
    port: 80
    hostname: api.guac.cncf.io
    allowedRoutes:
      namespaces:
        from: All

    # added for cert-manager HTTP01 solver
  - name: https-notes
    protocol: HTTPS
    port: 443
    hostname: notes.cncf.io
    tls:
      mode: Terminate
      certificateRefs:
      - name: codimd-tls
        kind: Secret
        group: ""
    allowedRoutes:
      namespaces:
        from: All

  - name: http-notes
    protocol: HTTP
    port: 80
    hostname: notes.cncf.io
    allowedRoutes:
      namespaces:
        from: All
...

5. Remove redundant ReferenceGrants

Since the new certificates are created on the same namespace with the Envoy Gateway (shared-gateway in our case), we don’t need the ReferenceGrants anymore. We removed them:

kubectl delete referencegrant --all -A

Conclusion

The migration from ingress-nginx to Envoy Gateway required careful attention to:

The Gateway API’s multi-layer architecture provides better separation of concerns compared to ingress-nginx, though it requires understanding additional resources like ReferenceGrants and BackendTLSPolicy.

To sum it up, we can say that the cloud native world already provided alternatives before the sun setting of ingress nginx. We hope this small insight can help you in your journey of migrating away from ingress nginx.