llm-d

llm-d is a Kubernetes-native, high-performance distributed LLM inference framework built on vLLM and the Kubernetes Gateway API Inference Extension, providing intelligent inference scheduling, prefix-cache-aware routing, prefill/decode disaggregation, hierarchical KV offloading, and traffic- and hardware-aware autoscaling across NVIDIA, AMD, Intel, and Google TPU accelerators.

llm-d was accepted to CNCF on March 12, 2026 at the Sandbox maturity level.

Key metrics, providing insights into development activity, community engagement, and project health. Powered by LFX Insights.

Health Score

Excellent (84)

Health Score measures a project’s overall trustworthiness across four key areas: contributors, development, popularity, and security.

Total contributors

3,140

+108% vs. previous year

Total contributing organizations

840

+71% vs. previous year

GitHub Stars

2,490

+75% vs. previous year

GitHub Forks

522

+388% vs. previous year

Software Value

$1.6M

First commit

April 29, 2025

Explore more insights

July 23, 2026

July 16, 2026