How Ascend leverages Cilium as a networking layer
Ascend is a data automation platform that runs a lot of batch and short lived workloads to ingest, transform, and orchestrate customer data. Their platform needs to run where the data is and ensure that the data stays secure. They needed a consistent networking layer that could deal with their high churn environment and keep data safe wherever their customer’s data is.
Ascend turned to Cilium as their CNI which simplified integrating into customer networks, eliminated their IP churn and density issues, and provided them with reliable encryption and network policies. It also provided them a consistent experience across every cloud and on-premise environment their customers required.
With Cilium, Ascend is able to reduce the time it takes to debug networking issues from 4-16 hours down to 20 seconds. They are also able to run the same CNI across all of their customer environments reducing context switching and providing consistent connectivity and encryption wherever required. Cilium allows them to run a higher pod density, reducing their compute requirements and increases the network architecture flexibility by requiring only 1/256 as many IPs as the default cloud provider networking, simplifying and speeding installation into customer environments.
By the numbers
reduced IPs 256x
Running overlay giving pods non-VPC-routable IPs
Complete data Encryption
At rest and transit
From 4-16 hours down to 20 seconds
Ascend is the leader in Data Pipeline Automation for building the world’s most intelligent data pipelines. It’s a single platform with intelligence to detect and propagate change across your data ecosystem and ensure accuracy end-to-end. Customers automate up to 90% of repetitive data engineering and reduce infrastructure costs with one place for end-to-end observability and automated lineage tracing. Customers can accurately cost data products with metadata-driven insights into team and solution resources used across their landscape. Ascend partners at every step of the data journey with product innovation and expert support that frees customers to focus on achieving goals.
Data has a lot of gravity and isn’t easy to move around thus Ascend needs to meet their customers where they – and their data – are. Data can also contain a lot of sensitive information that needs to stay private. Ascend needed to build their platform so that it could work wherever their customers needed, and would keep their data secure at the same time. Ascend chose Cilium to provide secure and consistent connectivity across clouds and on premise environments. Cilium allows Ascend to automate their network while still keeping their network secure.
Building a Unified Networking Layer Everywhere with Cilium
Being a data automation platform means Ascend needs to go where there is data, including integrating into customer networks and infrastructure.
“We have to meet customers at their most sensitive locations, even installing actually in their infrastructure or in their network. The challenges that this brings on is the breadth of things that we need to integrate with is quite wide because we have to understand the integration points for every cloud and their many different storage endpoints.”Joe Stevens, Member of the Technical Staff, Ascend.
Stevens runs a lean team at Ascend that prioritizes efficiency over just stacking humans, which means that they are always looking for leverage points where they can reduce the work required to get things done. For example, Ascend uses Kubernetes because every cloud has a managed service which immediately gives them a level of portability.
Since they are processing data, Ascend’s workloads are slightly different than what is normally run on Kubernetes. Rather than a web application, Ascend runs more function-as-a-service or batch based workloads. That means a lot of short-lived or single-use pods with a high pod density. Running workloads with all of the pod churn Ascend experiences, runs the risk of quickly running out of IPs. To help mitigate these issues, Ascend decided to double the number of IPs they were expecting to need on a specific node.
Ascend was originally using Flannel as their CNI when they were running Kops. However, as they migrated from Kops to EKS, they quickly determined that the AWS VPC CNI wouldn’t work for them because of the limits on how many ENIs could be installed on each node and the IPs those ENIs provided. This led them to explore options for bringing their own CNI.
“At the time Calico appeared to be the most prominent and available, but we also saw what was happening with eBPF. When we looked closer at Cilium, we saw a few things like network policy, Hubble, and little things like the network policy editor. It’s delightful. It’s really easy to use and it’s the only one I’ve seen. What Cilium was building out was just a lot more functionality than anyone else in the space and it seemed like a good thing to hitch our bandwagon to.”Joe Stevens, Member of the Technical Staff, Ascend.
Ascend never ran the AWS VPC CNI in production because of the IP limits issue, and jumped directly to Cilium. Cilium can run in EKS using an overlay mode that gives pods non-VPC-routable IPs, reducing the number of IPs they needed by 256x so Ascend started rolling Cilium out into production. Being able to move away from having to have a routable outside range allowed Ascend to be more flexible with their customer networks and even overlap with portions of their customer networks. As long as Ascend didn’t collide with the data sources they needed to connect to, the routing problem was resolved. Over time as Ascend learned how hard it was to perform private network installs in AKS and GKE, they decided to adopt the same networking model everywhere with Cilium in order to bring the benefits of the network overlay.
This was massive from the customer acquisition point of view. Surprisingly, one of Ascend’s largest challenges is just figuring out how to integrate into a customer’s network. “You get into rooms with the customer networking team, and pretty quickly you start to run into challenges because we’re coming in as a vendor asking for a significant IP footprint, and in many cases they just don’t have a large enough allocation available, which is completely reasonable! Meanwhile as we’re navigating the options to establish an IP range for the product, support is working with the end user trying to find ways to help keep them productive until the install can be completed. At the end of the day the IP range negotiation itself is not a good use of anyone’s time, just a friction point and cost to the business.” said Stevens.
Securing Data with Encryption and Network Policy
Networking topology wasn’t the only challenge that Cilium helped Ascend solve. They also needed to keep their data secure. Cilium helped Ascend encrypt data and create secure network policies.
A large part of processing all sorts of sensitive data is figuring out compliance. For example, to comply with HIPAA, Ascend needed to ensure that all data is encrypted at rest and transit. When Ascend originally set encryption up, they used certificate-init-container to generate certificates but ran into all sorts of problems like having to wait for the startup container before they could start their service or the cert container just expiring its way out of existence. Eventually, it was no longer possible to use these startup containers to bootstrap all of the certs. “Why spend any time doing that, if you could just have the actual network itself take care of that for you” said Stevens. Ascend switched to using Cilium for encrypting data and haven’t looked back.
“Spark was the other reason that Cilium became a killer feature that we needed to roll out across every cloud. Spark is a great tool, but sometimes their built in encryption will fail at random. Statistically, at some point it will crash so if you’re dealing with a 12 hour job, it’s gonna fail on hour 11 and that is a terrible thing to try to explain to the customer. Cilium with IPsec doesn’t have that problem. Why have Spark be doing encryption when what we really want Spark to be doing is processing data. We chose to have a reasonable isolation of priorities and responsibilities and have spark be focused on data processing and have the network layer that is responsible for encryption.”Joe Stevens, Member of the Technical Staff, Ascend.
On the network policy side, Ascend needed to be prepared for actors inside of their compute environment that aren’t fully trusted. As more multi-tenant environments were rolled out, network policy and the ability to control who has access to what, became more and more crucial. “From experience, we know that getting network policy correct is difficult and when you do get it wrong it is a nightmare. Trying to understand what’s going on with traditional tooling means, you probably throw three engineers at the problem for five hours while with Hubble you know what’s happening in about three seconds. It was one of those very easy trade offs to explain to my CEO. We’re going to encounter the cost of debugging, let’s make it a lot less expensive.”
Cilium: A Unified Network Layer
Cilium has become a core part of Ascend’s platform and the first thing they install on a new Kubernetes cluster. Networking is not Ascend’s core business, and Cilium allows them to create the network and not have to worry about it. For the future, Ascend is looking at Gateway API and Cilium Service Mesh.
“A lot has changed since we first started adopting Cilium. The industry has become more clear with Cilium becoming the de facto networking layer for all three clouds. The last one that really sent it home was the Azure switching to using Cilium as their CNI. This brings us consistency across all clouds wherever our customers are. For points of leverage to reduce the amount of work that we need to do overall, this consistency is a major one of them. It allows us to know what we’re working with everywhere and we have the same networking layer in all three clouds. We can have that same Hubble experience, the same networking guarantees, and features, like encryption. Cilium is a unified network layer that solves a bunch of problems that we want to solve.”Joe Stevens, Member of the Technical Staff, Ascend.