Tool descriptions are eating up all your AI tokens (but they don’t have to)

Posted on October 31, 2025 by Craig McLuckie, Stacklok

The vast majority of developers now use AI coding assistants daily. As these tools become more advanced and widely adopted, usage quotas and rate limits have also become a familiar frustration. Many providers enforce weekly or monthly usage caps to manage compute costs. Once you hit a limit, you might be blocked, throttled, or shifted to slower processing queues — halting productivity and disrupting your workflow.

But there’s a surprising reason behind much of this token consumption: wasted tool metadata.

As contributors exploring the Model Context Protocol (MCP) have found, the vast majority of tokens consumed by AI coding assistants come not from your prompts or code, but from unnecessary tool descriptions that get bundled into each request.

Where the waste comes from

Let’s say you’ve installed MCP servers for GitHub, Grafana, and Notion. You ask your AI coding assistant to:

“List the 10 most recent issues from my GitHub repo.”

That simple prompt uses 102,000 tokens, not because the task is complex, but because the model receives metadata for 114 tools, most of which have nothing to do with the request.

Other common prompts create similar waste:

“Summarize my meeting notes from October 19, 2025”
uses 240,600 tokens, again with 114 tools injected, even though only the Notion server is relevant
“Search dashboards related to RDS”
consumes 93,600 tokens

In each case, only a small fraction of those tokens are relevant to the task. Even saying “hello” burns more than 46,000 tokens.

Multiply that across even a few dozen prompts per day, and you’re burning millions of tokens on context the model doesn’t need. That’s not just expensive, it’s disruptive. In rate-limited enterprise environments or time-sensitive projects, this inefficiency slows down responses, breaks flow, and cuts directly into productivity.

Reducing token waste with smarter tool selection

To address this inefficiency, open-source contributors within the MCP community have developed mechanisms for selective tool discovery and invocation.

For example, you can use tool groups to predefine which tools are exposed to an agent, but this approach is still static and requires ongoing manual effort. Alternatively, you could use an MCP Gateway to filter which MCP servers are presented to an agent based on a user’s role or other criteria, but gateways operate at the server level rather than at the tool level.

An alternative involves using something like the open source ToolHive MCP Optimizer. MCP Optimizer acts as an intelligent broker between models and tools. Rather than sending every tool’s metadata with each interaction, this intermediary streamlines communication through lightweight primitives — for example, find_tool and call_tool.

Here’s how it works in practice:

When a developer issues a request like “Create a new fork of the project,” the assistant first asks the broker to find_tool.
The broker identifies only the necessary tools (e.g., a few relevant GitHub tools).
Those minimal descriptions are sent to the model, which then performs call_tool to execute the task.

This approach dramatically reduces the size of each request — cutting token usage by more than 50% — without slowing responses or losing accuracy.

Community-driven innovation

Efforts like this highlight how open collaboration around MCP standards can improve the efficiency and scalability of AI development workflows. By focusing on interoperability and selective context management, the community is ensuring that developers can connect AI systems to complex toolchains more securely, efficiently, and affordably.

This work also underscores a broader principle: Open standards and open collaboration drive innovation faster than any single proprietary system can.

Looking ahead

Smarter orchestration layers and context brokers represent an important step forward in how AI assistants interact with tools and APIs. They reduce unnecessary overhead, help avoid rate limits, and make AI coding assistants more sustainable for both developers and providers.

Community-led projects are continuing to refine these mechanisms and welcome participation from anyone interested in contributing to the evolving MCP ecosystem.

Amsterdam, Netherlands

Where the waste comes from

Reducing token waste with smarter tool selection

Community-driven innovation

Looking ahead