University of Wisconsin–Madison enhances multi-cluster visibility and control with Cilium
Challenge: Scaling Secure Research Networking at UW–Madison
The University of Wisconsin–Madison is one of the nation’s premier research universities, ranked 8th nationally for research volume with $1.3 billion in research expenditures. As a top R1 research institution, UW–Madison serves thousands of researchers across nearly all academic disciplines, committed to the Wisconsin Idea, improving people’s lives beyond campus boundaries.
When developers at UW–Madison began containerizing research workloads, early environments struggled to meet regulatory standards. Kubernetes was identified as a secure, compliant, auditable solution, but the networking layer became a constraint as infrastructure scaled.
“We went from Flannel to Weave to Calico, each time hitting limitations,” explains Cory Sherman, DevOps engineer at UW–Madison.
Early CNI implementations lacked the network policy enforcement and visibility needed for sensitive, federally regulated workloads in a rapidly changing environment.
Key challenges included:
- Cluster Instability: Developers on shared clusters frequently broke each other’s environments.
- Resource Constraints: Operating on a “student-sized budget” while serving 25+ clusters.
- Compliance Requirements: Federal regulations demanded fine-grained network controls and comprehensive visibility.
- Limited L7 Visibility: Existing CNIs couldn’t provide application-layer network insights.
The team re-architected their environment with dedicated clusters for staff, central development QA, multiple staging clusters per “stream” and multiple production clusters per “stream”, enabling collaborative testing in a central QA cluster without disrupting individual work. This separation stabilized workloads and created a foundation for compliant isolation.
By the numbers
25+
Kubernetes clusters managed across research and academic environments
zero
sidecar overhead across all clusters using Cilium’s service mesh
1
centralized observability platform via Cilium Hubble and Prometheus
Solution: Turning compliance challenges Into a secure, scalable network
As regulatory requirements tightened, UW–Madison needed a network layer that could deliver fine-grained control without sacrificing performance. The team migrated from Calico to Cilium, gaining an eBPF-powered data plane with Layer 7 observability and workload identity-based policies that aligned with compliance-driven environments.
“We needed fine-grained network controls and comprehensive visibility into application-layer traffic,” Sherman explains. “Cilium’s L7 capabilities allowed us to meet regulatory requirements while giving researchers the network performance they needed.”
Cilium decouples security from IP addressing, allowing policies to move with workloads across environments. This simplified policy management and enabled faster scaling without manual rule updates. Its cluster-wide policy model ensures consistent enforcement, while Cluster Mesh extends those controls across multiple Kubernetes clusters, delivering auditability and consistency.
Observability without the overhead
As the team’s Kubernetes footprint expanded, they needed a way to observe, secure, and control traffic between workloads and across clusters.
A service mesh provided a path to consistent L7 traffic management and visibility, but traditional sidecar-based approaches were too resource-intensive for a research environment. Cilium’s sidecarless service mesh approach offered the same observability and policy controls at a fraction of the cost and complexity.
“Traditional service meshes would have required proxy sidecars in every pod across every cluster, a massive compute tax on a research budget,” Sherman explains. “Cilium’s service mesh uses eBPF and a node proxy to eliminate the overhead of sidecars.”
The architecture combines Hubble’s real-time flow metrics with Prometheus for centralized monitoring.
“Researchers are extremely cost aware when it comes to spending,” notes Sherman. “Instead of running multiple Prometheus instances on every cluster, we deployed lightweight agents that forward metrics into a central observability platform. Combined with Cilium Hubble, we got comprehensive visibility without infrastructure overhead.”
Developers can visualize service-to-service communication, trace L7 traffic flows, and troubleshoot issues quickly—without sacrificing compute resources.
Impact: Driving operational efficiency at scale
UW–Madison’s cloud native architecture now runs leaner, faster, and more predictably. Cilium Hubble provides continuous insight into network flows, while Prometheus delivers metrics to a single observability hub. Together, they form a lightweight but powerful monitoring pipeline that scales with demand and minimizes maintenance overhead.
By separating development, staging, and production environments, the team improved overall platform stability and simplified troubleshooting. And with zero sidecar overhead, UW–Madison delivers comprehensive Layer 7 observability and performance monitoring—all while keeping resource usage low and operations lean.
“Managing 25+ Kubernetes clusters in support of research workloads is complex,” Sherman reflects. “Researchers want to maximize compute for science, not infrastructure overhead. Cilium lets us deliver deep network visibility across all clusters while keeping our per-cluster footprint minimal.”
Looking ahead
The team continually evaluates its tooling and is expanding its use of the Cilium ecosystem to consolidate tools, exploring Tetragon for runtime security enforcement and enhanced VM connectivity to unify Kubernetes and traditional workloads.
“Cilium has been phenomenal,” says Sherman. “We started with core networking, added Hubble for observability, Cluster Mesh for cross-cluster communication, and exploring Tetragon is always on the table. It’s a platform that grows with our needs, and from our interactions with the community, we’re confident in its long-term support and roadmap.”