Case Study

SF Technology

Effective GPU: A Heterogeneous AI Virtualization Pooling solution by SF technology built on top of HAMi

About

SF Technology, the technology arm of SF Express—China’s leading integrated logistics service provider—operates one of the largest and most advanced logistics networks in the world. With a strong focus on digital transformation and intelligent logistics, SF Technology leverages cutting-edge AI and cloud native technologies to optimize operations, enhance customer experience, and drive business innovation.

Challenge

As AI workloads grow in scale and diversity, SF Technology and similar organizations face significant challenges in maximizing the utilization of heterogeneous compute resources such as GPUs, NPUs, and other accelerators:

  1. Low resource utilization: Static allocation leads to a large amount of compute and memory resources remaining idle for long periods, making them unavailable for efficient reuse by other workloads.
  2.  High operational costs: Organizations continuously pay for underutilized resources, significantly increasing the total cost of ownership for infrastructure.
  3. Inflexible scheduling: Once resources are statically assigned, the scheduler cannot dynamically adjust allocations based on real-time business needs, limiting the elasticity and scalability of the cluster.

Enterprises need a unified, flexible, and efficient way to manage and share heterogeneous AI resources across clusters and workloads, while supporting both mainstream and domestic hardware platforms.

Solution

To address these challenges, the team developed EffectiveGPU, an in-house solution built on top of the CNCF Sandbox project HAMi to provide GPU pooling technologies.

Challenges:
Industry:
Location:
Cloud Type:
Product Type:
Published:
September 18, 2025

Projects used

By the numbers

0 Changes

To Nvidia driver, linux kernel, task image and source code

Up to 57%

GPU Savings for production and test clusters

Up to 100%

GPU utilization improvement with GPU virtualization

pic 1: EffectiveGPU architect
pic 1: EffectiveGPU architect

EffectiveGPU leverages several key capabilities provided by HAMi:

EffectiveGPU further extends the capabilities of HAMi and provides the following advanced features:

By leveraging these technologies, the team can deploy more services with fewer GPUs, dynamically adjust resource allocation based on real-time demand, and support heterogeneous compute environments.

Impact

1. Large Model Inference Services

In AI production model services, traditional GPU resource allocation methods often result in low resource utilization and high costs. By adopting EffectiveGPU technology, resource utilization can be significantly improved and operational costs reduced. For example, AI production model services that have switched to EffectiveGPU deployed 65 services using only 28 GPUs, saving 37 GPUs. This approach not only increases GPU utilization but also makes service deployment more flexible, allowing dynamic resource allocation based on actual needs and avoiding resource waste.

2. Test Service Cluster Scenarios

In cluster testing services, flexible resource allocation and efficient utilization are crucial for testing efficiency and cost control. EffectiveGPU technology enables test services to flexibly allocate GPU resources according to different testing task requirements through partitioning. For instance, Testing services using EffectiveGPU deployed 19 services on 6 test GPUs, saving 13 GPUs. This not only improves testing efficiency but also reduces testing costs, enabling more rational resource utilization.

3. Speech Recognition Scenarios

Speech recognition services require efficient computing resources to ensure real-time performance and accuracy. EffectiveGPU provides flexible support for speech recognition through priority scheduling and resource over-provisioning. GPU resources can be dynamically allocated based on the urgency and resource requirements of speech recognition tasks, ensuring that high-priority tasks receive sufficient resources in a timely manner and improving the quality of speech recognition services.

4. Domestic AI Hardware Adaptation Scenarios

EffectiveGPU technology supports not only mainstream GPU hardware but also domestic AI computing platforms such as Huawei Ascend and Baidu Kunlun. This provides strong support for the application and promotion of heterogeneous AI technologies, enabling efficient management and resource utilization in localized environments. For example, in scenarios using heterogeneous AI chips, EffectiveGPU’s scheduling and management can fully leverage the advantages of heterogeneous AI computing devices, promoting the development and application of heterogeneous AI technologies.