The world of AI workloads is changing fast. A few years ago, “AI on Kubernetes” mostly meant running long training jobs. Today, with the rise of Large Language Models (LLMs), the focus has shifted to include complex inference services and Autonomous Agents. The industry consensus, backed by CNCF’s latest Annual Cloud Native Survey, is clear: Kubernetes has evolved to become the essential platform for intelligent systems. This shift from traditional training jobs to real-time inference and agents is transforming cloud native infrastructure.

This shift creates new challenges:

The Volcano community is responding to these needs. With the release of Volcano v1.14, Kthena v0.3.0, and the new AgentCube, Volcano is transforming from a batch computing tool into a Full-Scenario, AI-Native Unified Scheduling Platform.

1. Volcano v1.14: Breaking Limits on Scale and Speed

As clusters expand and workloads diversify, scheduler bottlenecks can degrade performance. Volcano v1.14 introduces a major architectural evolution to address this.

Scalable Multi-Scheduler Architecture

Traditional setups often rely on static resource division, leading to wasted capacity. Volcano v1.14 introduces a Sharding Controller that dynamically calculates resource pools for different schedulers (Batch, Agent, etc.) in real-time.

High-Throughput Agent Scheduling

Standard Kubernetes scheduling often struggles with the high churn rate of AI Agents. The new Agent Scheduler (Alpha) in v1.14 provides a high-performance fast path designed specifically for short-lived, high-concurrency tasks.

Enhanced Resource Efficiency

To optimize infrastructure costs, v1.14 adds support for generic Linux OSs (Ubuntu, CentOS) and democratizes enterprise features like CPU Throttling and Memory QoS. Additionally, native support for Ascend vNPU maximizes the utilization of diverse AI hardware.

2. Kthena v0.3.0: Efficient and Scalable LLM Serving

The CNCF survey has identified AI inference as the next major cloud native workload, representing the bulk of long-term cost, value, and complexity. Kthena v0.3.0 directly addresses this challenge, introducing a specialized Data Plane and Control Plane architecture to solve the speed and cost balance for serving large models.

Optimized Prefill-Decode Disaggregation

Separating “Prefill” and “Decode” phases improves efficiency but introduces heavy cross-node traffic.

Simplified Deployment with ModelBooster

Deploying large models typically involves managing fragmented Kubernetes resources.

Cost-Efficient Heterogeneous Autoscaling

Running LLMs exclusively on top-tier GPUs can be cost-prohibitive.

3. AgentCube: Serverless Infrastructure for AI Agents

While Kubernetes provides a solid infrastructure foundation, it lacks specific primitives for AI Agents. AgentCube bridges this gap with specialized capabilities.

Instant Startup via Warm Pools

Agents require immediate responsiveness that standard container startup times cannot match.

Native Session Management

AI Agents require state persistence across multi-turn interactions, unlike typical stateless microservices.

Serverless Abstraction

Developers need to focus on agent logic rather than server management.

Conclusion

Volcano has evolved beyond batch jobs. With v1.14, Kthena, and AgentCube, we now provide a comprehensive platform for the entire AI lifecycle—from training foundation models to serving them at scale  to powering the next generation of intelligent agents.

By embracing cloud native principles to deliver scalable, reliable infrastructure for the AI lifecycle, Volcano is contributing to the community’s goal of ensuring AI workloads behave predictably at scale. As organizations seek consistent and portable AI infrastructure (a concept championed by initiatives like the Kubernetes AI Conformance Program), Volcano is positioning itself as a core component of that solution.

We invite you to explore these new features and join us in building the future of AI infrastructure.

If you are attending KubeCon + CloudNativeCon Europe, we encourage you to stop by our booth, P-14A, in the Project Pavilion to say hi and learn more about the latest updates.