llm-d is a Kubernetes-native, high-performance distributed LLM inference framework built on vLLM and the Kubernetes Gateway API Inference Extension, providing intelligent inference scheduling, prefix-cache-aware routing, prefill/decode disaggregation, hierarchical KV offloading, and traffic- and hardware-aware autoscaling across NVIDIA, AMD, Intel, and Google TPU accelerators.
llm-d was accepted to CNCF on March 12, 2026 at the Sandbox maturity level.Project Insights
Key metrics, providing insights into development activity, community engagement, and project health. Powered by LFX Insights.
Health Score
Excellent (85)
Health Score measures a project’s overall trustworthiness across four key areas: contributors, development, popularity, and security.
Total contributors
2,822
+130% vs. previous year
Total contributing organizations
774
+90% vs. previous year
GitHub Stars
2,254
+92% vs. previous year
GitHub Forks
459
+496% vs. previous year
Software Value
$1.6M
First commit
April 29, 2025