
Transforming Networking with Cilium at Ecco
Ecco, a global leader in shoe production and retail, operates across the world, crafting high-quality leather goods, including fashionable bags and shoe accessories. To support its operations, Ecco employs a data-driven approach to optimize its supply chain and forecast demand. The company’s IT infrastructure, nearly 100% Kubernetes-based, is designed to facilitate machine learning (ML) workflows that enable intelligent decision making and supply chain management.
Challenges
Ecco faced several hurdles with its IT infrastructure as it scaled globally. Managing networking across multiple cloud providers created complexity, and costs associated with data transfer and load balancing were mounting. The default networking solutions also caused performance bottlenecks, slowing down machine learning workloads critical to supply chain optimization. Furthermore, vendor lock-in restricted Ecco’s ability to adapt and innovate in its multi-cloud strategy.
Solution
To address these challenges, Ecco implemented Cilium, leveraging its eBPF-based capabilities to simplify and enhance networking. The team used Cilium Cluster Mesh for unified networking across multiple Kubernetes clusters, enabling seamless communication without relying on costly cloud-specific solutions. They also adopted Cilium’s chaining mode for a smooth transition and integrated it with MinIO to establish a fixed-cost, high-performance storage system. Costs were reduced by 50%, saving hundreds of thousands annually, while latency for critical operations dropped by 33%. Modular deployment allowed Ecco to address issues incrementally, avoiding disruptions to existing workflows. With its robust and flexible networking, Cilium has empowered Ecco to innovate and scale its operations efficiently and cost efficiently.
Cost and Performance Limitations On A Machine Learning Platform For Global Supply Chain Operations
Ecco, a household name in leather goods and retail, operates in nearly every corner of the world. The company’s vertically integrated approach of sourcing raw materials, producing leather, and crafting high-quality products requires a finely tuned supply chain. To support this complexity, Ecco’s IT infrastructure plays a pivotal role, leveraging a Kubernetes-based platform that powers machine learning workloads for operations like demand forecasting and supply chain optimization.
As Ecco scaled its operations, the limitations of its existing IT setup became apparent. Initially, the company relied on default cloud networking solutions, such as the AWS VPC CNI. While functional, these solutions were not built for the demands of a global enterprise handling terabytes of data daily. George Zubrienko, Ecco’s platform engineer, explained, “Managing connections across different cloud storage providers was cumbersome. We needed a solution that could simplify this complexity and reduce costs.”
By the numbers
50% Reduced
networking and storage costs
33% Decreased
latency for machine learning workflows
30% Less developer
time spent preparing applications for production
“Managing connections across different cloud storage providers was cumbersome. We needed a solution that could simplify this complexity and reduce costs.” – George Zubrienko, Ecco’s platform engineer
The cost of data transfers and load balancing was one of the biggest challenges. Ecco’s machine learning workloads required rapid, reliable access to large datasets spread across multiple storage solutions. Each transfer incurred expenses, and moving terabytes of data daily made these costs unsustainable. At the same time, relying on cloud-specific networking solutions created vendor lock-in. As Zubrienko put it, “This dependency limited our ability to adapt our infrastructure as our needs evolved.”
Performance was equally pressing. Latency in data access slowed machine learning algorithms, a key component for inventory optimization. “The bottleneck was often just reading the data,” Zubrienko explained. “If it takes five minutes to read data instead of 30 seconds, our decisions are delayed, and that affects the entire supply chain.”
Reducing Costs and Complexity with Cilium Cluster Mesh
Ecco’s IT team began evaluating alternatives to overcome these hurdles. They explored solutions like AWS’s VPC Lattice and service meshes such as Istio. Ultimately, Cilium’s eBPF-based approach stood out. Zubrienko noted, “Cilium’s documentation made it clear what we needed to do. It wasn’t like other solutions where you’re stuck figuring out why the mesh isn’t connecting.”
“Cilium’s documentation made it clear what we needed to do. It wasn’t like other solutions where you’re stuck figuring out why the mesh isn’t connecting.”
The implementation started with Cilium’s chaining mode, enabling a smooth transition from their existing VPC CNI setup. This incremental approach minimized disruptions and allowed the team to evaluate Cilium’s capabilities step by step. Cilium began by helping Ecco address load balancing traffic-related costs, which had also become a significant financial burden. “Cloud load balancers charge you for every byte of traffic,” explained George Zubrienko, Ecco’s platform engineer. “For a company like ours, moving terabytes of data daily across multiple cloud environments, these costs quickly added up to unsustainable levels.”
With the adoption of Cilium, Ecco was able to eliminate its reliance on traditional cloud based load balancers by implementing direct pod-to-pod communication across Kubernetes clusters using Cilium Cluster Mesh. This not only reduced the need for intermediary traffic routing but also ensured that internal communication could bypass expensive load balancer fees without sacrificing performance. “By leveraging Cilium’s eBPF capabilities, we maintained high-performance networking while completely removing unpredictable traffic costs,” Zubrienko added. Cilium’s ability to unify clusters into a seamless network meant that Ecco could route traffic internally in an optimized manner, avoiding the bottlenecks and additional expenses tied to traditional load balancing solutions.
The improvements also extended beyond cost savings. The streamlined traffic flow significantly improved latency and network efficiency, allowing Ecco to handle the demands of its machine learning workloads more effectively. “Each spark executor running on a separate machine is now able to directly connect to the storage node that it wants to write data to so it’s essentially two hosts just talking to each other. When we benchmarched, latency dropped by 33% and tasks that previously took minutes to process were completed in seconds, which directly impacted our decision-making capabilities and operational speed,” Zubrienko noted. This optimization was particularly impactful for their global supply chain operations, where every second counts in forecasting demand and managing inventory.
Improving the network not only reduced costs, but also complexity. “All our applications perceive storage as a single endpoint,” Zubrienko explained. “Regardless of changes, the client applications always see the same URL internally. That consistency simplifies our operations immensely.”
By replacing load balancers with a Cilium-powered approach, Ecco not only reduced costs by hundreds of thousands annually but also built a more scalable and future-proof networking architecture. The company’s IT infrastructure now supports rapid data access and real-time processing at a fraction of the cost, ensuring that Ecco can continue to innovate and grow without being constrained by legacy networking costs.
“The huge cost reduction that was the main selling point of this project plus it greatly simplified application deployments, meaning developers will spend 30 % time less in preparing the app for production. And again for algorithms and data preparation, it’s very important to make decisions fast because as we sell shoes, our knowledge about the state of our inventory evolves, and we cannot make optimal decisions if we make them on past data.”
Incremental Enhancements for Big Improvements
Cilium’s modular design proved to be a game-changer for Ecco. The company could adopt features gradually, aligning with its priorities. “One thing I like very much about Cilium is that you can enable what you want step by step,” Zubrienko said. “It doesn’t destroy the whole system, and you can add more when you’re ready.”
Integrating Cilium with MinIO was another milestone. MinIO, an open-source, S3-compatible storage system, allowed Ecco to replace costly cloud storage with a fixed-cost, high-performance alternative. “Instead of paying $300,000 a year for storage, we now pay around $150,000,” Zubrienko shared. “That’s a 50% reduction in costs, and it’s all thanks to Cilium enabling a more efficient network. Even with IPv6 networks, enabling Cilium was just a flip of a switch. It was surprisingly straightforward.”
This flexibility opened doors for future enhancements. For example, Ecco plans to eliminate kube-proxy, which has become a bottleneck in network performance, and explore advanced features like Egress Gateway. “Our goal is to make our platform fully portable between cloud and on-prem environments,” Zubrienko explained. “Cilium is helping us get there.”
Ecco’s journey with Cilium is a testament to the power of open-source innovation. Despite its rapid development, Cilium has maintained exceptional reliability and clarity in documentation. Zubrienko remarked, “With Cilium, what I see is what I get. It’s rare to find an open source project that’s so well-maintained. It makes my life much easier, and I can trust it completely.”
The impact of Cilium on Ecco’s business is profound. Beyond technical achievements, it has driven tangible business results. Faster compute and data access improved decision-making, ensuring products reached customers in time. “Speed is everything in our business,” Zubrienko emphasized. “If we can move goods efficiently and forecast demand accurately, we win. Cilium has become a cornerstone of our strategy. It’s not just a tool; it’s a game-changer.”