MCP tool sprawl: why more servers can make agents worse

Give an agent access to two well-described tools, and tool choice is usually obvious. Give it ten MCP servers with overlapping verbs, inconsistent schemas, and broad descriptions, and tool choice becomes part of the problem. The conventional wisdom — connect more capabilities, get a more capable agent — is partly right and partly a trap. The trap is that every tool an agent can see is a tool the agent has to pay attention to, and attention has a cost.

This is MCP tool sprawl. It isn't a bug in MCP, and it isn't a flaw in any particular server. It's an emergent property of the way agents reason over their available toolset. The trap is that teams add MCP servers expecting capability gains and only discover the reasoning cost later — usually after the agent starts picking wrong tools, calling redundant tools, or burning context budget on definitions for tools it never uses.

The shape of the problem has three parts: token cost, selection quality, and description quality. Each is manageable in isolation. Together, they decide how well your agent can actually use the tools you give it.

Why more tools can mean worse tool selection

Tool selection is a search problem. Given a user request and a set of available tools, the model has to identify which tool — or sequence of tools — matches the request. With two tools, the choice is between two options. With two hundred, the choice is between two hundred options, and the model has to consider each one. Not literally, but in the sense that all of them are in its context as it decides.

Tool-use benchmarks increasingly treat tool selection as its own failure mode, not just a question of whether the model is "smart enough." A model can understand the task and still choose the wrong function, skip a necessary function, or pass the wrong arguments. That risk gets worse when the available tools are numerous, overlapping, or similarly described. Recent MCP-focused benchmark work also calls out long tool descriptions and parameter schemas as a practical limit on how many tools can be made available in a single run.

The failure mode isn't always obvious. An agent might call the right tool but with subtly wrong parameters because it confused two similar tool schemas. An agent might call a redundant pair of tools because both seemed plausible. An agent might skip a necessary call entirely because something else in the toolset matched more loosely. None of these read as "tool selection failure" in logs; they read as "the agent did the wrong thing."

The token cost of idle tools

Before any work begins, every visible tool definition competes for space in the model's context. Many MCP servers expose multiple tools — often separate operations for search, list, read, create, update, delete, or destination-specific actions — each with a name, description, parameter schema, and usually one or more example invocations. Connect several servers and the agent starts every conversation with thousands of tokens of tool definitions before reading a single character of the user's prompt.

MCP tool sprawl: why adding more servers can make agents worse

Connecting more MCP servers can give agents more capabilities — but it also expands the tool surface they have to reason over. Here's why agent performance can degrade as the tool surface grows, and what changes when routing moves into configuration.

Why more tools can mean worse tool selection

The token cost of idle tools

How tool descriptions compete for the agent's attention

The CRUD-per-destination pattern and why it compounds

What an emit-shaped tool surface looks like

When CRUD-style tools are the right answer

When emit-style tools are the right answer

The choice you're actually making

Frequently asked questions