llm-d

llm-d is a Kubernetes-native, high-performance distributed LLM inference framework built on vLLM and the Kubernetes Gateway API Inference Extension, providing intelligent inference scheduling, prefix-cache-aware routing, prefill/decode disaggregation, hierarchical KV offloading, and traffic- and hardware-aware autoscaling across NVIDIA, AMD, Intel, and Google TPU accelerators.

llm-d was accepted to CNCF on March 12, 2026 at the Sandbox maturity level.

Project Insights

Key metrics, providing insights into development activity, community engagement, and project health. Powered by LFX Insights.

Health Score

Excellent (85)

Total contributors

2,822

+130% vs. previous year

Total contributing organizations

774

+90% vs. previous year

GitHub Stars

2,254

+92% vs. previous year

GitHub Forks

459

+496% vs. previous year

Software Value

$1.6M

First commit

April 29, 2025