HeadsUpAI

Anthropic Launches Cache Diagnostics to Debug Silent Claude API Cost Spikes

Anthropic launched Cache Diagnostics in beta, providing visibility into prompt cache misses within the Claude Console and API. Prompt caching requires a byte-for-byte match of the prompt prefix to save cost. The feature identifies the specific divergence point—like changes in system prompts or tools—that invalidated the cache.
Feature state
Beta
Required header
cache-diagnosis-2026-04-07
Diagnostic types
model_changed, system_changed, tools_changed, and messages_changed
Availability
Claude API only
Data retention
Zero Data Retention eligible

This update addresses a friction point in AI economics where minor, non-deterministic changes cause 10x cost spikes. While the Claude Prompt Caching Dashboard tracks hit rates, diagnostics explain the cause. It follows recent optimizations like Claude's prompt cache pre-warming, completing the toolset needed for reliable, high-context agentic workflows.

To use the feature, include the cache-diagnosis-2026-04-07 beta header and pass the previous response id. The API returns a cache_miss_reason and an estimate of missed tokens, helping you fix root causes like reordered tool arrays. It is currently available on the Claude API but not on Amazon Bedrock or Vertex AI.

ClaudeDevs
ClaudeDevs
@ClaudeDevs
X

Prompt cache diagnostics are now in Claude Console. When a request misses the cache, you can now see exactly which part of your prompt changed and how many tokens it cost you. https://t.co/z0dV6zzLPm

121retweets2.5klikes
View on X

Still wondering? A few quick answers below.

Cache Diagnostics is a beta feature for the Claude API that helps developers identify why a prompt failed to hit the cache. It compares consecutive requests to find the exact point where a prompt prefix diverged, such as a changed system instruction or a reordered tool list, which silently invalidates the expected cost savings.

When you include a specific beta header in your API request, Anthropic stores a lightweight fingerprint of the prompt. By passing the ID of a previous message, the API compares the new request against the stored fingerprint and returns a reason for any cache miss, such as a change in the model, system prompt, or tools.

This feature is currently available exclusively through the Claude API and the Claude Developer Console. It is not supported on third-party cloud platforms like Amazon Bedrock or Google Cloud Vertex AI. Developers must use the Anthropic API directly and include the required beta header to access these diagnostic insights during the beta period.

Common causes include interpolating dynamic data like timestamps into the system prompt, reordering the tools array, or using non-deterministic JSON serialization for tool schemas. Other reasons include changing the model between turns or editing earlier messages in the conversation history rather than treating the message list as an append-only structure.

No, this feature is eligible for Zero Data Retention and does not store raw prompt text or model outputs. It only retains cryptographic hashes and token-count estimates, known as fingerprints, for a short period. These fingerprints are scoped to your specific organization and workspace and are used solely for comparing consecutive requests.

Share this update