Case Study

XiaoHongShu

How XiaoHongShu (Red Note) handled TikTok refugee surges with Karmada

Company overview

Xiaohongshu (RedNote) is a leading lifestyle and social e-commerce platform in China with over 300 million monthly active users. The platform’s core business centers on search, advertising, and recommendations, complemented by social networking and e-commerce capabilities. These services demand massive computational resources and handle enormous data volumes, with individual index tables reaching terabyte scale.

To meet these resource demands while maintaining business agility, Xiaohongshu operates a hybrid cloud infrastructure combining self-built data centers with multiple public cloud providers, with Karmada as the core orchestration layer. This architecture enables elastic scaling and ensures the platform can respond quickly to traffic surges, spanning 200+ Kubernetes clusters with over 10 million CPU cores.

architecture

“Karmada has become essential to our multi-cloud strategy, enabling us to manage resource fragmentation as a competitive advantage rather than a operational burden. The platform’s Kubernetes-native approach meant zero migration cost, while its extensibility allowed us to build production-grade features. Most importantly, during periods of unexpected traffic surges, Karmada’s federation capabilities proved invaluable—we could elastically burst to cloud resources for specific services rather than entire chains, delivering the lowest risk and cost profile during peak demand.”

Yuqi Huang, Director of Cloud Native Infrastructure at Xiaohongshu
Location:
Cloud Type:
Published:
March 15, 2026

Projects used

By the numbers

10x

traffic spike during TikTok migration event

200+

Kubernetes clusters with over 10 million CPU cores

300+

million monthly active users

The Challenge: Resource fragmentation

As Xiaohongshu’s business grew rapidly, its infrastructure demands quickly outpaced the capacity of individual Kubernetes clusters. Both self-built data centers and managed Kubernetes services (TKE/ACK) from cloud vendors have cluster size limits for stability reasons. To accommodate the expanding workloads, the platform evolved into dozens of Kubernetes clusters spread across multiple cloud providers and self-built data centers.

While this multi-cluster approach solved the capacity problem, it introduced a new challenge: resource fragmentation.

architecture

The fragmented infrastructure created multiple operational problems:

The ideal deployment model

architecture

The ideal deployment model is straightforward: business teams select a region, and the platform provides a unified resource pool. This directly addresses the pain points above by: Business teams no longer perceive cluster details—deploy once to a region and let the platform handle placement. Infrastructure gains the flexibility to balance capacity, pool GPUs, and burst to cloud without handoffs to the business side.

The TikTok refugee crisis: A wake-up call

In January 2025, when TikTok faced potential restrictions, millions of users suddenly migrated to Xiaohongshu, causing massive traffic spikes. The self-built data centers’ resources were immediately exhausted. The team faced three difficult options:

  1. Emergency server procurement: But with Chinese New Year approaching, delivery was uncertain and costs would be astronomical
  2. Traffic redistribution: Shift more traffic to cloud regions, but this would require scaling entire service chains, not just individual services, dramatically increasing costs and complexity
  3. Leverage federation: Use the federated cluster system to elastically burst to cloud resources only for services under pressure

The third option, enabled by Karmada-based federation, proved to be the solution.

The solution: Karmada-based multi-cluster federation

Xiaohongshu chose to build a federated cluster system with Karmada at its core, focusing on two key principles:

  1. Hide cluster complexity from business teams: Enable teams to think in terms of regions, not individual clusters
  2. Unify resource scheduling: Create a global resource pool from fragmented clusters
architecture

Architecture design principles

The solution included three core components:

1. Unified API gateway

Xiaohongshu maintained Kubernetes API compatibility to ensure zero migration cost for existing platforms and applications. They built a custom access layer that intelligently routes requests:

This design allows standard Kubernetes clients (kubectl, client-go) to work seamlessly without modification.

flow diagram

2. Multi-tier scheduling system

The scheduling architecture uses a “self-built first, cloud backup” policy:

3. Enhanced workload orchestration

In Xiaohongshu’s practice, workload changes require strict control over the rollout pace, similar to MaxUnavailable and MaxSurge configuration in Deployment. Working together with the community, they designed an extension solution that implements federation-level rolling update strategies by extending the Resource Interpreter.

training model

As shown in the diagram, during each application update, regardless of how many clusters the replicas are distributed across, each rolling phase updates only one replica at a time, effectively improving the safety of the changes.

Federated HPA

Traditional HPA operates at the individual cluster level, requiring each cluster to independently monitor metrics and make scaling decisions. This approach becomes inefficient in a federated environment where workloads span multiple clusters. Xiaohongshu moved the autoscaling logic to the federation layer, creating a unified view of application health and resource utilization across all clusters.

The Federated HPA Controller (FHPA) collects pod-level metrics from member clusters, aggregates them at the workload level, and makes intelligent scaling decisions based on the complete application state.

architecture

This architecture enables more efficient resource utilization across clusters. For example, during traffic spikes, FHPA can intelligently scale up replicas in the most appropriate clusters based on available capacity and scheduling policies, rather than requiring each cluster to independently react to local metrics.

Custom StatefulSet Developed a stateful workload controller with customizable index orchestration for search and recommendation services. This enables:

architecture

Results and benefits

Search and recommendation: Precise management at scale

The search and recommendation services are among the most critical systems at Xiaohongshu, processing massive index tables reaching terabyte scale. With the custom StatefulSet controller, these large-scale stateful workloads can now be precisely managed and orchestrated across the federation:

GPU resource pool unification for LLM inference

Large language model inference had unique challenges:

architecture

With federation:

Surviving the traffic surge

During the TikTok user migration crisis, the federation system proved its value:

This approach delivered the lowest risk and cost profile during the crisis.

architecture

Why Karmada?

Xiaohongshu evaluated multiple federation solutions and selected Karmada for several reasons:

  1. Kubernetes-native API: No application refactoring required; standard Kubernetes clients work without modification
  2. Resource-centric design: Focus on resource scheduling aligns with Xiaohongshu’s efficiency goals
  3. Flexible cluster access: Both push and pull modes support diverse network topologies
  4. Active community: Open governance and responsive community support ongoing improvements
  5. Extensibility: Provides out-of-the-box and configurable capabilities, making it convenient to extend controllers and scheduling policies

Future plans

Xiaohongshu continues to expand their Karmada-based federation with plans to:

  1. AI Training and Big Data Workload Federation: Extending federation support to AI training workloads and Spark big data jobs, leveraging community contributions and existing ecosystem solutions
  2. Contributing Extensions Back: Sharing Xiaohongshu’s custom enhancements, including Fleet-Root and other production-grade features, back to the Karmada community to benefit other users
  3. Intelligent Scheduling Policies: Contributing cost and latency-aware scheduling strategies to help optimize resource allocation across hybrid cloud environments
  4. Community Performance Optimization: Joining the Karmada community’s performance optimization efforts to help continuously improve the platform’s efficiency and reliability

Conclusion

Xiaohongshu’s journey with Karmada demonstrates how multi-cluster federation can transform hybrid cloud operations from a management burden into a competitive advantage. By focusing on resource efficiency, maintaining Kubernetes compatibility, and extending Karmada thoughtfully, the company built a system that not only solved immediate operational pain points but also provided the foundation for handling unexpected challenges—like a sudden 10x traffic spike from millions of new users.

The success during the TikTok migration event proved the architecture’s core idea: unified resource scheduling across fragmented clusters enables both operational efficiency and business resilience. As Xiaohongshu continues to scale, Karmada remains central to their strategy for managing complexity while maintaining agility in a dynamic multi-cloud environment.