Coverimage for ClickHouse showing descending lights of varying brightness
Case Study

ClickHouse

How ClickHouse is Using Cilium to Implement Efficient Network Policies

Challenge

ClickHouse Cloud is a managed service on top of the open source online analytical processing (OLAP) database, ClickHouse. To make efficient use of resources and keep customer data secure, their platform needed to be able to isolate customer processes from each other. ClickHouse needed a tool to help them implement efficient network policies for their Kubernetes workloads and provide them with strong isolation per customer.

Solution

ClickHouse turned to Cilium as their preferred networking solution to take advantage of eBPF performance and simplify the process of isolating customers from each other. Cilium enabled them to create dedicated CiliumNetworkPolicies for each customer’s Kubernetes namespace to control access to specific resources, even if a customer manages to break into their Kubernetes pods.

Impact

With Cilium, they’ve developed a system that completely isolates customers from one another. They can now run their customers’ processes at scale and keep their customers’ data secure and isolated. They have also gained extra value from Cilium by enabling additional features like ClusterMesh and Hubble. Cilium has helped ClickHouse ingest petabytes of data, and process trillions of inserts and billions of selects on top of ClickHouse Cloud with efficiency and control in a multi-tenant serverless environment.

Challenges:
Industry:
Location:
Cloud Type:
Product Type:
Published:
June 1, 2023

Projects used

Argo
Cilium
Helm
Istio
Kubernetes

By the numbers

Massive scale

Secured 10+PiB of streaming data & 30+ trillion inserted records in the first months of deployment

Multi-cloud

10,000+ pods across multiple regions and cloud providers

Time to value

Weeks from POC to in-production deployment

Building an Efficient Networking Layer with Cilium

ClickHouse has three teams working together on their cloud offering: the control plane team, the data plane team, and the core team. ClickHouse Cloud runs across AWS and Google Cloud but will be expanding to more clouds in the future as their customers demand it. The data plane team is responsible for the networking and needed a solution that worked everywhere their customers wanted to go. Each cloud provider has multiple ClickHouse Keeper and server replicas that all need to talk to one another and have pieces for autoscaling, idle scaling, provisioning, and monitoring that need access to these components.

When they initially started building ClickHouse Cloud, they gave each tenant a namespace to run their workload. However, they quickly realized they needed stronger isolation between each of the tenants. They started evaluating different CNI options, like AWS VPC CNI and Calico but ultimately decided to go with Cilium.

“We checked a few performance comparisons and I just like [Cilium’s] eBPF approach a lot more. It worked out of the box, and the documentation was really nice. We also trust in Cilium because it has really broad adoption.”

Marcel Birkner, Cloud Software Engineer, ClickHouse.

Cilium also allowed them to have more options for configuring network policies, like utilizing FQDN.  They created dedicated CiliumNetworkPolicies for each customer; One network policy for ClickHouse keeper and another for each ClickHouse server.

Image demonstrating CiliumNetworkPolicies for customers

Enabling Additional Features: ClusterMesh and Hubble

Once they had Cilium set up and configured their network policies, they also realized that there were other useful features in Cilium. 

“We didn’t plan to use ClusterMesh in the beginning, but we later realized that we have another great feature that we leverage for Ingress.”

Timur Solodovnikov, SRE, ClickHouse.  

In front of their data plane clusters, they run dedicated Kubernetes clusters that use Istio as an ingress proxy and for TLS termination. To make this ingress work across all their clusters, they used Cilium’s ClusterMesh because it allowed them to forward traffic based on Kubernetes service names. After including ClusterMesh in their setup, their NetworkPolicies continued to work across clusters and they used labels to enable Istio ingress to reach the ClickHouse Server. 

When a new security requirement, where external inbound TCP connections to any ClickHouse service are only allowed from the Istio proxy came up, it wasn’t a problem thanks to ClusterMesh. They were able to seamlessly allow access from a remote Kubernetes cluster to their ClickHouse service through Istio just using a network policy. While ClusterMesh wasn’t on their initial roadmap, with Cilium already installed in their cluster, it was an easy feature to switch on to meet the changing demands of their business.

As they started to roll out the platform, they also ran into a few bugs, luckily they could also leverage Hubble to help solve them.

“I used Hubble to debug [the issues], to see network flows, how things are going, where it’s blocked, because we had problems with traffic forwarding and it wasn’t clear. What is that? Is it a network policy or something else? When we initially installed Cilium, we didn’t enable Hubble, but now we have it installed in every cluster because it is so useful for debugging.”   

Timur Solodovnikov, SRE, ClickHouse.  

Meeting Business Requirements and Providing Further Value

Cilium has now become an integral part of ClickHouse’s infrastructure, providing all-around solutions to their networking and observability needs. It helped them meet their business requirements by providing secure isolation for customers. Once Cilium was installed, enabling ClusterMesh allowed ClickHouse to simplify their networking stack and Hubble provided deep observability for debugging networking issues. Cilium has allowed ClickHouse to forget about their network and focus on their business.

If you would like to read more about ClickHouse Cloud architecture and how Cilium fits in, you can read about it in this blog.