Learn how Kubernetes-native traffic management and SD-WAN integration can deliver consistent security, observability, and performance across distributed clusters.

The challenge of distributed Kubernetes networking

Modern businesses are rapidly adopting distributed architectures to meet growing demands for performance, resilience, and global reach. This shift is driven by emerging workloads that demand distributed infrastructure: AI/ML model training distributed across GPU clusters, real-time edge analytics processing IoT data streams, and global enterprise operations that require seamless connectivity across on-premises workloads, data centers, cloud providers, and edge locations.

Today businesses are increasingly struggling to ensure secure, reliable and high-performance global connectivity while maintaining visibility across this distributed infrastructure. How do you maintain consistent end-to-end policies when applications traverse multiple network boundaries? How do you optimize performance for latency-sensitive critical applications when they could be running anywhere? And how do you gain clear visibility into application communication across this complex, multi-cluster, multi-cloud landscape? 

This is where a modern, integrated approach to networking becomes essential, one that understands both the intricacies of Kubernetes and the demands of wide-area connectivity. Let’s explore a proposal for seamlessly bridging your Kubernetes clusters, regardless of location, while intelligently managing the underlying network paths. Such an integrated approach solves several critical business needs:

In this post we discuss how to interconnect Cilium with a Software-Defined Wide Area Network (SD-WAN) fabric to extend Kubernetes-native traffic management and security policies into the underlying network interconnect. Learn how such integration simplifies operations while delivering the performance and security modern distributed workloads demand.

Towards an intelligent network fabric 

Imagine a globally distributed service deployed across dozens of locations worldwide. Latency-critical microservices are deployed at the edge, critical workloads run on-premises for data protection, while elastic services leverage public cloud scalability. These components must constantly communicate across cluster boundaries: IoT streams flow to central management, customer data replicates across regions for sovereignty compliance, and real-time analytics span multiple sites.

Bridging Kubernetes and SD-WAN with Cilium

Enter Cilium, a universal networking layer connecting Kubernetes workloads, VMs, and physical servers across clouds, data centers, and edge locations. Simply mark a service as “global” and Cilium ensures its availability throughout your distributed multi-cluster infrastructure (Figure 1). But even single-cluster Kubernetes deployments may benefit from an intelligent WAN interconnect, when different nodes and physical servers of the same cluster may run at multiple geographically diverse locations (Figure 2). No matter at which location a service is running, Cilium intelligently routes and balances load across the entire deployment.

Yet a critical gap remains: controlling how traffic traverses the underlying network interconnect. Modern wide-area SDNs like a modern SD-WAN implementation (such as Cisco Catalyst SD-WAN) would easily deliver the intelligent interconnect these services need, by providing performance-guarantees for SD-WAN tunnels between sites with traffic differentiation. Unfortunately, currently leveraging these capabilities in a Kubernetes-native way remains a challenge.

Figure 1: Multi-cluster scenario: An SD-WAN connects multiple Kubernetes clusters.

Figure 1: Multi-cluster scenario: An SD-WAN connects multiple Kubernetes clusters.

Figure 2: Single-cluster scenario: An SD-WAN fabric interconnects geographically distributed nodes of a single Kubernetes cluster.

Figure 2: Single-cluster scenario: An SD-WAN fabric interconnects geographically distributed nodes of a single Kubernetes cluster.

We suggest leveraging the concept of a Kubernetes operator to bridge this divide. Continuously monitoring the Kubernetes API, the operator could translate service requirements into SD-WAN policies, automatically mapping inter-cluster traffic to appropriate network paths based on your declared intent. Simply annotate your Kubernetes service manifests, and the operator handles the rest. For the purposes of this guide we will use a minimalist SD-WAN operator.Other SD-WAN operators (such asAWI Catalyst SD-WAN Operator)  offer varying degrees of Kubernetes integration. 

The role of a Kubernetes operator

Need guaranteed performance for business-critical global services? One service annotation will route traffic through a dedicated SD-WAN tunnel, bypassing congestion and bottlenecks. Require encryption for sensitive data flows? Another annotation ensures tamper-resistant paths between clusters. In general, such an intelligent cloud SD-WAN interconnect would provide the following features:

A Kubernetes operator can bring these capabilities, and many more, into the Kubernetes ecosystem, maintaining the declarative, GitOps-friendly approach cloud-native teams expect.

Enforcing traffic policies with Cilium and Cisco Catalyst SD-WAN

In this guide, we demonstrate how an operator can enforce granular traffic policies for Kubernetes services using Cilium and Cisco Catalyst SD-WAN. The setup ensures secure, prioritized routing for business-critical services while allowing best-effort traffic to use default paths. 

We will assume that SD-WAN connectivity is established between the clusters/nodes so that the SD-WAN interconnects all Kubernetes deployment sites (nodes/clusters) and routes pod-to-pod traffic seamlessly across the WAN. We further assume Cilium is configured in Native Routing Mode so that we have full visibility into the traffic that travels between the clusters/nodes in the SD-WAN. 

Once installed, the SD-WAN operator will automatically generate SD-WAN policies based on your Kubernetes service configurations. This seamless integration ensures that your network policies adapt dynamically as your Kubernetes environment evolves.

To illustrate, let’s look at a demo environment (see Figure 3) featuring:

In this setup, as you define or update Kubernetes services, the operator will automatically program the underlying SD-WAN fabric to enforce the appropriate connectivity and security policies for your workloads.

Figure 3: A simplified demo setup with a single-cluster Kubernetes with two nodes interconnected with Cilium and a modern SD-WAN implementation (such as Cisco Catalyst SD-WAN). Note: This pattern extends seamlessly to multi-cluster deployments.

Figure 3: A simplified demo setup with a single-cluster Kubernetes with two nodes interconnected with Cilium and a modern SD-WAN implementation (such as Cisco Catalyst SD-WAN). Note: This pattern extends seamlessly to multi-cluster deployments.

End-to-end policy enforcement example

Within the cluster, we will deploy two services, each with specific connectivity and security requirements:

By tailoring network policies to the unique needs of each service, we achieve both operational efficiency for routine workloads and robust protection for sensitive business data.

By default, all traffic crossing the cluster boundary uses the default tunnel. In order to ensure that the Business Service uses the Business WAN we just need to add a Kubernetes annotation to the corresponding Kubernetes Service:

code

How does this work? The SD-WAN operator watches Service objects, extracts endpoint IPs/ports from the Business Service pods, and dynamically programs the SD-WAN to enforce the business tunnel policy (see Figure 4).

Figure 4: The Kubernetes objects read by the SD-WAN operator to instantiate the SD-WAN rules for the business service.

Figure 4: The Kubernetes objects read by the SD-WAN operator to instantiate the SD-WAN rules for the business service.

Future directions: Observability and SLO awareness

Meanwhile, Figure 5 illustrates the SD-WAN configuration generated by the SD-WAN operator. The configuration highlights two key aspects:

Together, these mechanisms provide robust security and fine-grained control over service-specific traffic flows within and across your Kubernetes clusters.

Figure 5: The SD-WAN configuration issued by the operator

Figure 5: The SD-WAN configuration issued by the operator

Conclusion

By using a Kubernetes operator it is possible to integrate Cilium and a modern SD-WAN implementation (such as Cisco Catalyst SD-WAN) into a single end-to-end framework to intelligently connect distributed workloads at controlled security and performance. Key takeaways:

This demo operator, however, demonstrates only the first step by providing just the bare intelligent connectivity features. Future work includes  exploring end-to-end observability, monitoring and tracing tools. Today, Hubble provides an observability layer to Cilium that can show flows from a Kubernetes perspective, while Cisco Catalyst SD-WAN Manager and Cisco Catalyst SD-WAN Analytics provide extended network observability and visibility, however the missing bit is a single plane of glass. In addition, further future work might consider, exposing the SD-WAN SLOs to Kubernetes for automatic service mapping, and extending the framework to new use cases.

Learn more

Feel free to reach out to the authors at the contact details below. Visit cilium.io to learn more about Cilium. More details on Cisco Catalyst SD-WAN can be found on:
https://www.cisco.com/site/us/en/solutions/networking/sdwan/catalyst/index.html

The creators of Cilium and Cisco Catalyst SD-WAN are also hiring! Check out https://jobs.cisco.com/jobs/SearchJobs/sdwan or https://jobs.cisco.com/jobs/SearchJobs/isovalent for their listings.

Authors:

Gábor Rétvári

Tamás Lévai