HeadsUpAI

Nous Research launches Tool Search to prevent context bloat in Hermes Agent

Nous Research has released Tool Search for Hermes Agent to optimize tool handling. It uses a progressive-disclosure layer to replace large MCP (Model Context Protocol) schemas with three bridge tools. This prevents the context window—the data a model processes at once—from being crowded by technical definitions not immediately relevant to the task.
Bridge tools
tool_search, tool_describe, tool_call
Activation threshold
10% of context window
Search algorithm
BM25 with literal substring fallback
Core tools
terminal, read_file, write_file, and others
Configuration modes
auto, on, off

This addresses a bottleneck where multiple MCP servers consume significant reasoning capacity. By deferring schemas, Hermes Agent mirrors modular patterns seen in Agent Skills and Appwrite. This keeps the agent efficient with large toolsets, following its recent integration of Qwen 3.7 Max.

Users can enable the feature via hermes update. It triggers automatically when tool definitions exceed 10% of the context window. While the first use of a deferred tool adds a round trip to load the schema, the system caches results to maintain speed on subsequent turns.

Still wondering? A few quick answers below.

It is a progressive-disclosure feature that manages how AI agents load external tool definitions. Instead of filling the context window with every available tool schema at once, the system uses three bridge tools to search for and load specific technical definitions only when the model determines they are necessary for a task.

By deferring the loading of complex JSON schemas for MCP and plugin tools, the feature preserves more of the model's context window for reasoning and conversation history. This prevents context bloat, which often degrades an agent's ability to follow instructions or maintain accuracy as the available toolset grows larger.

The feature uses an automatic mode that triggers whenever deferrable tool schemas would consume at least 10% of the active model's context window. This threshold ensures that the system only incurs the minor latency of on-demand loading when the token savings are significant enough to justify the extra round trip.

Only external MCP servers and non-core plugin tools are eligible for deferral. Core Hermes Agent capabilities, such as terminal access, file operations, memory management, and web searching, are always loaded directly into the context window to ensure the agent's fundamental skills remain immediately available without any additional latency.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update