Kgateway may be new to the CNCF, but it’s not new to the market: it was born as “Gloo” in 2018, a project to provide modern API management within Kubernetes. Gloo built a large user base in the first major evolution of monolith to microservices in cloud native. Now, in 2025, another evolution is taking place, as users move from APIs to agents — with new workloads requiring new connectivity features. Kgateway brings the solid foundations of the past and innovation which sets the project up as the gateway of choice for the future, with new features for calling and serving LLMs as well as enhancing the use of agentic AI.

Traditional cloud native networking

Ingress

An ingress, edge router, or application gateway, is a service that routes traffic to services running inside a Kubernetes environment. (It might be directly accessible from the Internet, or routed to by another corporate gateway.)

Traditionally, this service was managed by a Kubernetes object also called an Ingress, and as such “ingress” in a Kubernetes context normally refers to an operator configuration where a control plane configures a hardware or software proxy server. The traffic can include API traffic, but can also include general traffic like web pages and images.

There are three major components that make up any ingress solution:

The Kubernetes Gateway API is a project designed to standardize the configuration language, replacing the legacy Ingress API with an extensible API across the industry.

Ingress projects therefore differentiate on their data plane (Envoy, HAProxy, NGINX, Traefik etc) and control plane.

Envoy is the future-proof engine for cloud native Layer 7 (L7): it was built from the ground up for such use cases. Kgateway (then Gloo) was one of the earliest adopters of Envoy.

Depending on the control plane implementation,  there can be substantial performance and scalability differences.  The Kgateway control plane is one of the fastest, most battle-tested control planes for Envoy, reflecting changes quicker and using less memory and CPU.

API gateway

An API gateway is a service that provides centralized gateway functionality for API traffic. The concept took off as service-oriented architectures were growing in popularity in the 2005-2015 era, with early vendors targeting backends that were run on Java. As Kubernetes and microservices became popular, the need arose to offer API gateway functionality that integrated with it; this was the initial use case for Gloo.

Kgateway supports many use API cases above and beyond that of a traditional ingress:

These advanced features are implemented as extensions to the Gateway API, which means you can use a consistent, thoughtfully designed API without ever having to “break glass” into configuring Envoy directly.

Service mesh waypoint

Ambient mesh is the new sidecarless data plane mode in the Istio service mesh. One of the key innovations of ambient mesh is that it splits Istio’s functionality into two distinct layers: a lightweight, secure overlay layer (implemented by a purpose-built node proxy) and a Layer 7 processing layer (implemented by L7 proxies called waypoints). The design of ambient mesh purposefully kept the secure overlay layer very lightweight with minimal function. The L7 layer was designed to be feature-rich and pluggable, enabling you to use your preferred L7 proxy as your waypoint.

Kgateway is the first project that can be used as a pluggable waypoint for Istio. Built on the same Envoy engine that Istio’s waypoint implementation uses, there are nonetheless differentiators that make the use of kgateway as a waypoint a compelling alternative.

Learn more about extending ambient mesh with Kgateway as a waypoint

AI use cases

Almost overnight, large language models (LLMs) and generative AI became the ultimate API use case. For applications that need to communicate with external services (like cloud APIs or LLM providers such as OpenAI), traffic control becomes even more critical. In such cases, using kgateway as an egress gateway can help manage costs, API key security, and caching, while ensuring sensitive data isn’t leaked.

Secure LLM usage with the AI gateway

AI Gateway unleashes developer productivity and accelerates AI innovation by providing a unified API interface that developers can use to access and consume AI services from multiple LLM providers. You can leverage and apply additional traffic management, security, and resiliency policies to the requests to your LLM provider. This set of policies allows you to centrally govern, secure, control, and audit access to your LLM providers.

Some of kgateway’s AI gateway features include:

Host models with the Inference Gateway

Unlike traditional web traffic, AI inference requests have unique characteristics that make conventional load-balancing techniques less effective. Inference requests often take much longer to process, sometimes minutes rather than milliseconds, and involve significantly larger payloads. A single request can consume an entire GPU, making scheduling decisions far more impactful than those for standard API workloads. At times, requests need to queue up while others are being processed.

To address these challenges, the Kubernetes Gateway API Inference Extension introduces APIs for inference-aware routing, which are implemented in Kgateway. Instead of forwarding the request to an Envoy load-balancing pool, Kgateway invokes an inference-aware endpoint selection extension. This evaluates the live state of model-serving instances by watching Prometheus metrics, considering factors such as LLM queue depth and available GPU memory. Based on these real-time metrics, it selects the most optimal model server pod for the request, ensuring better resource utilization and lower latency. Once a routing decision is made, the request is forwarded to the chosen pod, and the response is streamed back to the client.

This approach ensures that requests to AI/LLM models are distributed efficiently across available GPUs, preventing overload on specific instances while maximizing overall system performance. By introducing inference-aware logic into the routing layer, Kubernetes can optimize both latency and GPU utilization far beyond what traditional load-balancing or scheduling techniques allow.

Learn more about the inference extensions and how they are implemented in Kgateway

Introducing an optimized data plane for MCP and A2A protocols

With Agentic AI changing the way organizations build and deliver applications, organizations face the challenge of rapidly adopting new technologies and interoperability protocols to connect agents and tools in fragmented environments. To accelerate agent development, infrastructure is needed that transcends the rapidly changing landscape.

For example, the Model Context Protocol (MCP) is an open protocol that standardizes how Large Language Model (LLM) applications connect to various external data sources and tools. Without MCP, you need to implement custom integrations for each tool that your LLM application needs to access. However, this approach is hard to maintain and can cause issues when you want to scale your environment. With MCP, you can significantly speed up, simplify, and standardize these types of integrations.

While MCP provides an excellent foundation for interoperability between agents and tools, there are still challenges to address when integrating MCP clients and MCP tool server implementations. Kgateway v2.0.0 includes a custom MCP proxy, which addresses common friction points in securing, scaling, and integrating MCP clients with tool servers, including:

The team behind kgateway has expanded its MCP proxy into Agent Gateway, a new Rust-based proxy server that lets you connect, discover, federate, and secure agent-to-agent and agent-to-tool communications in any environment — including bare metal, VMs, containers, and Kubernetes. Agent Gateway supports  popular agent protocols, such as MCP and A2A, and the ability to integrate existing REST APIs as agent-native tools. Like Envoy, Agent Gateway can be configured with the xDS control plane, making kgateway the perfect control plane for AI use cases. Support for Agent Gateway will replace the MCP proxy in an upcoming release.

From Gloo to kgateway

Today, Gloo runs in thousands of production environments as a tier-0 application. You’re probably using it every day. You may not realize it, but for millions of people in the US alone, when you swipe your credit card, when you order fast food, or when you order a new iPhone, you’re using Gloo.

Last November, my colleagues Idit and Keith stood on the stage at KubeCon + CloudNativeCon North America and announced Solo.io’s intention to donate Gloo to the CNCF. The TOC quickly voted to accept the project, renamed to kgateway, and we made our first major release of kgateway (v2.0.0) on April 1 – no joke! 

Gloo is now a downstream of kgateway, where Solo will continue developing enterprise features. We already have maintainers from six other organizations, and are looking forward to many more adopters and maintainers joining as development in kgateway continues into its eighth year.  We’ve recently accepted a major new feature — global rate limiting — from a kgateway end user (thanks Mayowa!)

Onwards and upwards — north, south, east and west!