Case Study

ECCO

Transforming Networking with Cilium at Ecco

Ecco, a global leader in shoe production and retail, operates across the world, crafting high-quality leather goods, including fashionable bags and shoe accessories. To support its operations, Ecco employs a data-driven approach to optimize its supply chain and forecast demand. The company’s IT infrastructure, nearly 100% Kubernetes-based, is designed to facilitate machine learning (ML) workflows that enable intelligent decision making and supply chain management.

Challenges

ECCO, a global leader in footwear production and retail, operates across the world, crafting high-quality leather goods, including fashionable bags and shoe accessories. To support its operations, ECCO employs a scientific approach to optimize its supply chain and fulfill customer demands in the best way possible, aiming to deliver the right product, at the right place, at the right time. The set of products that serve this purpose is developed and maintained by a dedicated department, ECCO Data & AI. The team’s infrastructure, nearly 100% Kubernetes-based, is designed to facilitate machine learning and AI workflows integrated into company’s operations that enable automated intelligent decision making and supply chain management.

ECCO Data & AI faced several hurdles with its IT infrastructure as it scaled globally. Managing networking across multiple cloud providers, heavy reliance on load balancers and NAT gateways created complexity and raised the costs of traffic flowing between applications as optimizations were computed and applied to the supply chain. Networking solutions commonly found in hyperscaler offerings also caused performance bottlenecks, slowing down machine learning workloads critical to supply chain optimization. Furthermore, vendor lock-in associated with the use of cloud providers default networking solutions limited the company’s ability to innovate and integrate cutting-edge open source products.

Solution

To address these challenges, ECCO Data & AI implemented Cilium, leveraging its eBPF-based capabilities to simplify and enhance networking. The team used Cilium Cluster Mesh to create a single networking layer covering multiple Kubernetes clusters, enabling seamless communication between applications and storage without relying on costly cloud-specific solutions. They also adopted Cilium’s chaining mode with AWS VPC CNI, so the existing infrastructure could start using Cilium without going through a migration phase, as well as ensuring a smooth transition to Cilium IPAM in the future. Using an S3-compatible storage solution (MinIO) deployed as the “sun” of the network’s “universe”, applications could connect to it privately and securely via Cilium Cluster Mesh, providing a fixed-cost, high-performance storage system.

Impact

Storage costs were reduced by 50%, while latency for critical operations dropped by 33%. Modular deployment capabilities of Cilium’s Helm Chart allowed ECCO Data & AI to address issues incrementally, avoiding disruptions to existing workflows. With its robust and flexible networking, Cilium has empowered ECCO Data & AI to innovate and scale its operations effectively and cost efficiently.

Cost and Performance Limitations On A Machine Learning Platform For Global Supply Chain Operations

ECCO, a household name in leather goods and retail, operates in nearly every corner of the world. The company’s vertically integrated approach of sourcing raw materials, producing leather, and crafting high-quality products requires a finely tuned supply chain. To support this complexity, ECCO Data & AI plays a pivotal role, leveraging a Kubernetes-based platform that powers machine learning workloads for operations like demand forecasting and supply chain optimization.

Initially, the company relied on default cloud networking solutions, such as the AWS VPC CNI, as well as storing its data both in AWS S3 and Azure Datalake. While functional and reliable, VPC CNI does not provide out-of-box solutions for communications across EKS (Kubernetes) clusters without adding Network Load Balancing or VPC Lattice. S3 and Azure Datalake, being industry standard object storage solutions, are difficult to use together in a sustainable manner. Finally, ECCO Data & AI’s strategy was to have full control over what, where and when everything is computed.  George Zubrienko, Data & AI lead platform engineer, explained: “Managing connections across different cloud storage providers was cumbersome. We needed a solution that could simplify this complexity and will scale with our business and not the volume of our storage transactions.”

Challenges:
Industry:
Location:
Cloud Type:
Product Type:
Published:
June 18, 2025

Projects used

By the numbers

33% reduction

in latency for critical operations

50% lower

storage costs

30% speed up

of SQL workloads due to reduction in internal network latency

“Managing connections across different cloud storage providers was cumbersome. We needed a solution that could simplify this complexity and will scale with our business and not the volume of our storage transactions.” – George Zubrienko, Ecco’s Data & AI lead platform engineer

The cost of data transfers and load balancing was one of the biggest challenges. Data & AI’s workloads required rapid, reliable access to large datasets. Each transfer incurred expenses, and processing increasing amounts of data daily [8] made these costs too high for the long run. At the same time, relying on cloud-specific networking and storage solutions created vendor lock-in. As Zubrienko put it, “This dependency limited our ability to adapt our infrastructure as our needs evolved.”

Though performance was not as pressing, ECCO’s Data & AI team strived to push boundaries. “The bottleneck was often just reading the data,” Zubrienko explained. “If it takes five minutes to read data instead of 30 seconds, our decisions are delayed, and that affects the entire supply chain.”

Reducing Costs and Complexity with Cilium Cluster Mesh


ECCO’s Data & AI Platform Engineering team began evaluating alternatives to overcome these hurdles. They explored solutions like AWS’s VPC Lattice and service meshes such as Istio. Ultimately, Cilium’s eBPF-based approach stood out. Zubrienko noted, “Cilium’s documentation made it clear what we needed to do. I never felt like I’m stuck figuring out why the mesh wasn’t connecting.”

“Cilium’s documentation made it clear what we needed to do. It wasn’t like other solutions where you’re stuck figuring out why the mesh isn’t connecting.”

The implementation started with Cilium’s chaining mode, enabling immediate adoption inside the existing VPC CNI setup. This incremental approach minimized disruptions and allowed the team to evaluate Cilium’s capabilities step by step. Cilium began by helping Data & AI address load balancing traffic-related costs in the new platform design, which could have also become a significant financial burden. “Cloud load balancers charge you for every byte of traffic,” explained George Zubrienko, ECCO Data & AI lead platform engineer. “Moving high volumes of data daily between MinIO, Trino, Spark instances and AI/ML algorithm training and inference containers, these costs quickly added up to unsustainable levels in our PoC environment.”

With the adoption of Cilium, ECCO Data & AI was able to eliminate its reliance on traditional network load balancers offered by cloud providers, by implementing direct pod-to-pod communication across Kubernetes clusters via Cilium Cluster Mesh. This not only reduced the need for intermediary traffic routing but also ensured that internal communication could bypass expensive load balancer fees without sacrificing performance. “By leveraging Cilium’s eBPF capabilities, we maintained high-performance networking while completely removing unpredictable traffic costs,” Zubrienko added. Cilium’s ability to unify clusters into a seamless network meant that Data & AI could now route and load balance traffic inside their storage private network in an optimized manner, avoiding the bottlenecks and additional expenses tied to traditional load balancing solutions.

The improvements also extended beyond cost optimization[9] . The streamlined traffic flow significantly improved latency and network efficiency, allowing ECCO Data & AI to handle the demands of its machine learning workloads more effectively. “Each Spark executor running on a separate machine, or a Trino worker node now directly connects to the storage node that it wants to write data to or read from, so it’s essentially two hosts just talking to each other. When we benchmarked the performance, latency dropped by 33% and tasks that previously took minutes to process were completed in seconds, which directly impacted our decision-making capabilities and operational speed,” Zubrienko noted. This optimization was particularly impactful for their global supply chain operations, where every second counts in forecasting demand and managing inventory.

Improving the network not only optimized costs but also simplified system architecture. “All our applications perceive storage as a single endpoint,” Zubrienko explained. “Regardless of changes, the client applications always see the same URL internally. That consistency simplifies our operations immensely. The huge cost reduction was the main selling point of this project, plus it greatly simplified application deployments, meaning developers will spend no time in configuring storage connections anymore. For algorithms and data preparation, it’s very important to make decisions fast because as we sell shoes, our knowledge about the state of our inventory evolves, and we cannot make optimal decisions if we make them on past data.”

Incremental Enhancements for Big Improvements


Cilium’s modular design proved to be a game-changer for ECCO Data & AI. The company could adopt features gradually, aligning with its priorities. “One thing I like very much about Cilium is that you can enable what you want step by step,” Zubrienko said. “It doesn’t destroy the whole system, and you can add more when you’re ready.”

Integrating Cilium with MinIO was another milestone. MinIO, an open-source, S3-compatible storage system, allowed Ecco to replace cloud storage with a growing cost profile, with a fixed-cost, high-performance alternative. “Our new storage system and networking setup powered by Cilium provided up to 50% reduction in storage operational cost, and it’s all thanks to Cilium enabling a more efficient network. Even with IPv6 networks, enabling Cilium was just a flip of a switch. It was surprisingly straightforward.”

This flexibility opened doors for future enhancements. For example, ECCO Data & AI plans to eliminate kube-proxy, which will unlock even higher networking performance, and explore advanced features like Egress Gateway. “Our goal is to make our platform fully portable between cloud and on-prem environments,” Zubrienko explained. “Cilium is helping us get there.”

ECCO Data & AI’s journey with Cilium is a testament to the power of open source innovation. Despite its rapid development, Cilium has maintained exceptional reliability and clarity in documentation. Zubrienko remarked, “With Cilium, what I see is what I get. It’s rare to find an open source project that’s so well-maintained. It makes my life much easier, and I can trust it completely.”

The impact of Cilium on ECCO’s business is profound. Beyond technical achievements, it has driven tangible business results. Faster compute and data access improved decision-making, ensuring the right products reached customers in time, at the right place. “Speed is everything in our business,” Zubrienko emphasized. “If we can move goods efficiently and forecast demand accurately, we win. Cilium has become a cornerstone of our strategy. It’s not just a tool; it’s a game-changer.”