Case Study

Ke Holdings Inc.

Scaling machine learning infrastructure with GPU virtualization using Kubernetes and HAMi

Background

Ke Holdings Inc. is an integrated online and offline platform for housing transactions and related services based in China. To support the company’s rapidly growing AI initiatives, a centralized infrastructure team operates the shared machine learning platform used across all business units.

The team provides end-to-end compute services for model development, training, and large-scale inference, supporting both internal research workloads and production-facing AI services. As model adoption and request volume increased across the organization, GPU efficiency and workload isolation became critical platform requirements.

Industry:
Location:
Cloud Type:
Published:
February 5, 2026

Projects used

By the numbers

3x

improvement in platform GPU utilization

10,000+

pods running simultaneously on the platform

10,000,000+

requests processed daily across the platform

Challenge

As Ke Holdings’ machine learning initiatives scaled, the infrastructure team faced significant challenges in GPU resource management:

Initially, the overall GPU utilization was only 13% due to the complexity of the multi-cloud environment and diverse workload requirements, which prompted the infrastructure team to seek solutions for improving cluster resource utilization.

Solution

architecture diagram

Using CNCF projects HAMi and Kubernetes as its foundation, Ke Holdings’ infrastructure team designed and implemented AIStudio, a smart computing platform that serves as the basis for the organization’s machine learning infrastructure. Leveraging Kubernetes and HAMi for GPU virtualization, it provides a unified platform that bridges upper-layer SaaS services with underlying compute resourcesIt provides the following capabilities:

To achieve this, Ke Holdings’ infrastructure team designed and implemented a GPU resource management platform built on top of Kubernetes and leveraging HAMi for GPU virtualization. Kubernetes was selected for its exceptional stability and robust cluster scheduling and management capabilities, which significantly reduce the operational complexity and maintenance overhead of large-scale clusters. Additionally, Kubernetes’ integration with the CNCF open ecosystem enables seamless adoption of various open-source solutions tailored to different use cases, such as HAMi. HAMi was chosen as it represents the most suitable GPU multiplexing and heterogeneous computing solution for AI Studio’s requirements.

architecture diagram

The team implemented a dual-cluster approach that separates workloads based on their resource requirements. 

This architectural separation guarantees that training jobs receive dedicated, predictable resources while inference services achieve high density through memory sharing, eliminating resource contention between different workload types and maximizing overall infrastructure efficiency.

Impact

By leveraging open-source technologies including HAMi and Kubernetes, AI Studio developed by infrastructure team has achieved:

The successful integration of HAMi as a foundational component demonstrates how open-source technologies can enable organizations to achieve remarkable infrastructure efficiency. 

Kubernetes (k8s) serves as the underlying platform foundation, enabling stable operations of tens of millions of daily business requests and tens of thousands of pods through its robust scheduling and management capabilities. By leveraging HAMi’s GPU multiplexing and heterogeneous scheduling optimization features, the cluster’s GPU utilization has increased by nearly 3x.

Future plans

Ke Holdings’ infrastructure team continues to innovate and expand their platform on top of HAMi and kubernetes, including: