Kgateway may be new to the CNCF, but it’s not new to the market: it was born as “Gloo” in 2018, a project to provide modern API management within Kubernetes. Gloo built a large user base in the first major evolution of monolith to microservices in cloud native. Now, in 2025, another evolution is taking place, as users move from APIs to agents — with new workloads requiring new connectivity features. Kgateway brings the solid foundations of the past and innovation which sets the project up as the gateway of choice for the future, with new features for calling and serving LLMs as well as enhancing the use of agentic AI.
Traditional cloud native networking
Ingress
An ingress, edge router, or application gateway, is a service that routes traffic to services running inside a Kubernetes environment. (It might be directly accessible from the Internet, or routed to by another corporate gateway.)
Traditionally, this service was managed by a Kubernetes object also called an Ingress, and as such “ingress” in a Kubernetes context normally refers to an operator configuration where a control plane configures a hardware or software proxy server. The traffic can include API traffic, but can also include general traffic like web pages and images.
There are three major components that make up any ingress solution:
- control plane
- data plane (proxy server)
- configuration language
The Kubernetes Gateway API is a project designed to standardize the configuration language, replacing the legacy Ingress API with an extensible API across the industry.
Ingress projects therefore differentiate on their data plane (Envoy, HAProxy, NGINX, Traefik etc) and control plane.
Envoy is the future-proof engine for cloud native Layer 7 (L7): it was built from the ground up for such use cases. Kgateway (then Gloo) was one of the earliest adopters of Envoy.
Depending on the control plane implementation, there can be substantial performance and scalability differences. The Kgateway control plane is one of the fastest, most battle-tested control planes for Envoy, reflecting changes quicker and using less memory and CPU.
API gateway
An API gateway is a service that provides centralized gateway functionality for API traffic. The concept took off as service-oriented architectures were growing in popularity in the 2005-2015 era, with early vendors targeting backends that were run on Java. As Kubernetes and microservices became popular, the need arose to offer API gateway functionality that integrated with it; this was the initial use case for Gloo.
Kgateway supports many use API cases above and beyond that of a traditional ingress:
- Routing and aggregation: Similar to an ingress, make APIs from different backend services (including serverless functions) available at a single endpoint, behind a single API key.
- Rate limiting: Limit the number of users that can call certain APIs, based on parameters such as identity, load, or cost.
- Security: Provide authentication and authorization, and Web Application Firewall functionality.
- Logging and monitoring: Track and display usage in order to provide visibility into the system to help with use cases such as API management, debugging, and accurate billing data.
These advanced features are implemented as extensions to the Gateway API, which means you can use a consistent, thoughtfully designed API without ever having to “break glass” into configuring Envoy directly.
Service mesh waypoint
Ambient mesh is the new sidecarless data plane mode in the Istio service mesh. One of the key innovations of ambient mesh is that it splits Istio’s functionality into two distinct layers: a lightweight, secure overlay layer (implemented by a purpose-built node proxy) and a Layer 7 processing layer (implemented by L7 proxies called waypoints). The design of ambient mesh purposefully kept the secure overlay layer very lightweight with minimal function. The L7 layer was designed to be feature-rich and pluggable, enabling you to use your preferred L7 proxy as your waypoint.
Kgateway is the first project that can be used as a pluggable waypoint for Istio. Built on the same Envoy engine that Istio’s waypoint implementation uses, there are nonetheless differentiators that make the use of kgateway as a waypoint a compelling alternative.
- Most Istio users have two completely different gateway implementations: proxies for ingress (or “north-south” traffic) and proxies for internal service-to-service traffic (or “east-west” traffic — in the traditional model, implemented by a mesh of sidecar proxies). Using kgateway for both means you have a single system for traffic in both directions, making it easier to manage and troubleshoot your traffic. With consistent configuration (Gateway API) observability, debugging, and operational experiences, you’ll reduce the complexity of having multiple gateway solutions.
- Kgateway offers first-class APIs for rate limiting, header manipulation, request transformations, external auth and processing. Many of these features are not possible in Istio’s reference waypoint implementation.
Learn more about extending ambient mesh with Kgateway as a waypoint
AI use cases
Almost overnight, large language models (LLMs) and generative AI became the ultimate API use case. For applications that need to communicate with external services (like cloud APIs or LLM providers such as OpenAI), traffic control becomes even more critical. In such cases, using kgateway as an egress gateway can help manage costs, API key security, and caching, while ensuring sensitive data isn’t leaked.
Secure LLM usage with the AI gateway
AI Gateway unleashes developer productivity and accelerates AI innovation by providing a unified API interface that developers can use to access and consume AI services from multiple LLM providers. You can leverage and apply additional traffic management, security, and resiliency policies to the requests to your LLM provider. This set of policies allows you to centrally govern, secure, control, and audit access to your LLM providers.
Some of kgateway’s AI gateway features include:
- Model traffic management: perform A/B model testing between two or more model versions, canary model rollouts to a subset of requests, or fail over to a backup provider if a primary model fails or becomes unavailable
- Prompt enrichment: pre-configure and refactor system and user prompts, extract common AI provider settings so that you can reuse them across requests, dynamically append or prepend prompts to where you need them, and overwrite default settings on a per-route level
- Prompt guarding: ensure that prompt-based interactions with a language model are secure, appropriate, and aligned with the intended use. Filter, block, monitor, and control LLM inputs and outputs to filter offensive content, prevent misuse, and ensure ethical and responsible AI usage
Host models with the Inference Gateway
Unlike traditional web traffic, AI inference requests have unique characteristics that make conventional load-balancing techniques less effective. Inference requests often take much longer to process, sometimes minutes rather than milliseconds, and involve significantly larger payloads. A single request can consume an entire GPU, making scheduling decisions far more impactful than those for standard API workloads. At times, requests need to queue up while others are being processed.
To address these challenges, the Kubernetes Gateway API Inference Extension introduces APIs for inference-aware routing, which are implemented in Kgateway. Instead of forwarding the request to an Envoy load-balancing pool, Kgateway invokes an inference-aware endpoint selection extension. This evaluates the live state of model-serving instances by watching Prometheus metrics, considering factors such as LLM queue depth and available GPU memory. Based on these real-time metrics, it selects the most optimal model server pod for the request, ensuring better resource utilization and lower latency. Once a routing decision is made, the request is forwarded to the chosen pod, and the response is streamed back to the client.
This approach ensures that requests to AI/LLM models are distributed efficiently across available GPUs, preventing overload on specific instances while maximizing overall system performance. By introducing inference-aware logic into the routing layer, Kubernetes can optimize both latency and GPU utilization far beyond what traditional load-balancing or scheduling techniques allow.
Learn more about the inference extensions and how they are implemented in Kgateway
Introducing an optimized data plane for MCP and A2A protocols
With Agentic AI changing the way organizations build and deliver applications, organizations face the challenge of rapidly adopting new technologies and interoperability protocols to connect agents and tools in fragmented environments. To accelerate agent development, infrastructure is needed that transcends the rapidly changing landscape.
For example, the Model Context Protocol (MCP) is an open protocol that standardizes how Large Language Model (LLM) applications connect to various external data sources and tools. Without MCP, you need to implement custom integrations for each tool that your LLM application needs to access. However, this approach is hard to maintain and can cause issues when you want to scale your environment. With MCP, you can significantly speed up, simplify, and standardize these types of integrations.
While MCP provides an excellent foundation for interoperability between agents and tools, there are still challenges to address when integrating MCP clients and MCP tool server implementations. Kgateway v2.0.0 includes a custom MCP proxy, which addresses common friction points in securing, scaling, and integrating MCP clients with tool servers, including:
- Simplifying tool onboarding: automated discovery and registration of MCP tool servers; providing developers with a centralized registry of MCP tools across heterogeneous tool servers regardless of their location
- MCP multiplexing: enable access to any MCP tool by using a single endpoint with innovative MCP multiplexing that turns an entire ecosystem of thousands of tools into a virtualized MCP tool server
- Securing MCP tool server implementations: providing consistent authentication and authorization controls for multi-tenant consumption
- Added observability: providing deep insights into AI agent and tool integrations with centralized metrics, logging, and tracing for all tool calls
The team behind kgateway has expanded its MCP proxy into Agent Gateway, a new Rust-based proxy server that lets you connect, discover, federate, and secure agent-to-agent and agent-to-tool communications in any environment — including bare metal, VMs, containers, and Kubernetes. Agent Gateway supports popular agent protocols, such as MCP and A2A, and the ability to integrate existing REST APIs as agent-native tools. Like Envoy, Agent Gateway can be configured with the xDS control plane, making kgateway the perfect control plane for AI use cases. Support for Agent Gateway will replace the MCP proxy in an upcoming release.
From Gloo to kgateway
Today, Gloo runs in thousands of production environments as a tier-0 application. You’re probably using it every day. You may not realize it, but for millions of people in the US alone, when you swipe your credit card, when you order fast food, or when you order a new iPhone, you’re using Gloo.
Last November, my colleagues Idit and Keith stood on the stage at KubeCon + CloudNativeCon North America and announced Solo.io’s intention to donate Gloo to the CNCF. The TOC quickly voted to accept the project, renamed to kgateway, and we made our first major release of kgateway (v2.0.0) on April 1 – no joke!
Gloo is now a downstream of kgateway, where Solo will continue developing enterprise features. We already have maintainers from six other organizations, and are looking forward to many more adopters and maintainers joining as development in kgateway continues into its eighth year. We’ve recently accepted a major new feature — global rate limiting — from a kgateway end user (thanks Mayowa!)
Onwards and upwards — north, south, east and west!