HeadsUpAI

OpenRouter Launches Response Caching to Deliver Free and Instant Identical Requests

· Updated

OpenRouter, a unified API for accessing hundreds of language models, launched a beta response caching feature that stores the full output of identical requests. By hashing the request body and parameters, the system returns stored responses from the edge without contacting the provider, resulting in zero token charges and sub-second latency.
Cache hit cost
$0 (zero tokens)
Cache hit latency
80-300ms
Cache TTL range
1 second to 24 hours
Supported endpoints
Chat completions, embeddings, messages
Availability
Beta, free for all users

This update matches the latency focus of OpenAI's agentic loops and automated testing, where identical prompts are often sent repeatedly during retries. It follows OpenRouter's model alias system and adds to OpenRouter Workspaces to streamline how developers manage and optimize their inference infrastructure.

You can enable caching by adding the X-OpenRouter-Cache: true header to chat completions or embedding requests, or by toggling it within a preset. The feature is currently free in beta and supports text, images, and tool calls, with configurable expiration times ranging from one second to 24 hours.

OpenRouter
OpenRouter
@OpenRouter
X

Introducing Response Caching: save tons of money and time on tests and agent retries. Blog post: https://t.co/1tasyIRssI Available for free. Learn more 👇 https://t.co/It2AXRhPAm

69retweets1.1klikes
View on X

Still wondering? A few quick answers below.

OpenRouter response caching is a beta feature that stores the full output of identical API requests at the network edge. When a user sends a request that matches a previously cached one, the system returns the stored response immediately. This eliminates the need to contact the model provider, saving both time and money.

When caching is enabled via a header or preset, OpenRouter creates a unique hash based on the request body, model, API key, and streaming mode. If a match is found in the cache, the response is delivered in roughly 80 to 300 milliseconds. Users can control the cache duration from one second up to 24 hours.

Prompt caching reduces the cost of processing input text when multiple requests share a common prefix at the provider level. In contrast, OpenRouter response caching skips the provider entirely by returning the full generated output from an edge cache. This results in zero tokens billed for the entire request, including both the prompt and the completion.

During the current beta period, OpenRouter response caching is available for free. While the first request that populates the cache is billed normally by the model provider, every subsequent identical request that hits the cache is billed at zero tokens. This applies to both the input prompt and the generated output tokens.

Response caching currently works across the chat completions, responses, messages, and embeddings endpoints. It supports text, images, audio, and tool calls, though very large multimodal payloads may be ineligible. Legacy completions, text-to-speech, speech-to-text, reranking, and video generation endpoints are not supported during the initial beta phase of the feature.

Share this update