Useful tip to cut time-to-first-token on longer prompts in the API: pre-warm the prompt cache. Send your system prompt before the user prompt. Claude writes it to the cache, but skips generating any output. When the real user request lands, it'll hit a warm cache. https://t.co/6BdEzbamr2
Anthropic Launches Prompt Cache Pre-Warming to Eliminate Initial Claude API Latency
Anthropic· Updated
Anthropic introduced a pre-warming method for the Claude API that uses a zero-token limit to load prompts into the cache without generating output. This allows developers to eliminate the latency penalty on the first request of a session for high-context applications. By proactively caching system instructions or large documents, tools like coding agents can achieve near-instant response times.
max_tokens: 0, the API processes the input and writes it to the cache without generating a response. This official method replaces the previous community workaround of using a single-token limit.- Cache write cost (5m)
- 1.25x base input price
- Cache write cost (1h)
- 2x base input price
- Cache read cost
- 0.1x base input price
- Minimum cache length
- 1,024 to 4,096 tokens
- Availability
- Claude API, AWS, Microsoft Foundry
Latency remains the primary friction point for Claude Code's autonomous workflows. While Claude's prompt caching dashboard reduced costs, the first request in a session still suffered a "cache miss" delay. Pre-warming removes this bottleneck, mirroring OpenAI's persistent connections to speed up autonomous loops.
To implement this, you must use explicit cache_control breakpoints on your static content. The pre-warm request is billed as a cache write but incurs no output token costs. Note that max_tokens: 0 is incompatible with streaming, extended thinking, or structured outputs. This feature is available across the Claude API, AWS, and Microsoft Foundry.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

