Anthropic Launches Cache Diagnostics to Debug Silent Claude API Cost Spikes

Anthropic

May 18, 2026 · Updated Jun 12, 2026

Anthropic introduced Cache Diagnostics, a beta feature that identifies exactly why a prompt failed to hit the cache. By comparing consecutive requests, developers can now pinpoint silent cache breakers like reordered tools or dynamic timestamps that inflate API expenses.

Anthropic launched Cache Diagnostics in beta, providing visibility into prompt cache misses within the Claude Console and API. Prompt caching requires a byte-for-byte match of the prompt prefix to save cost. The feature identifies the specific divergence point—like changes in system prompts or tools—that invalidated the cache.

Feature state: Beta
Required header: cache-diagnosis-2026-04-07
Diagnostic types: model_changed, system_changed, tools_changed, and messages_changed
Availability: Claude API only
Data retention: Zero Data Retention eligible

This update addresses a friction point in AI economics where minor, non-deterministic changes cause 10x cost spikes. While the Claude Prompt Caching Dashboard tracks hit rates, diagnostics explain the cause. It follows recent optimizations like Claude's prompt cache pre-warming, completing the toolset needed for reliable, high-context agentic workflows.

To use the feature, include the cache-diagnosis-2026-04-07 beta header and pass the previous response id. The API returns a cache_miss_reason and an estimate of missed tokens, helping you fix root causes like reordered tool arrays. It is currently available on the Claude API but not on Amazon Bedrock or Vertex AI.

View the full update on platform.claude.com

ClaudeDevs

@ClaudeDevsMay 18

Prompt cache diagnostics are now in Claude Console. When a request misses the cache, you can now see exactly which part of your prompt changed and how many tokens it cost you. https://t.co/z0dV6zzLPm

1332.7k

View on X

Still wondering? A few quick answers below.

Cache Diagnostics is a beta feature for the Claude API that helps developers identify why a prompt failed to hit the cache. It compares consecutive requests to find the exact point where a prompt prefix diverged, such as a changed system instruction or a reordered tool list, which silently invalidates the expected cost savings.

When you include a specific beta header in your API request, Anthropic stores a lightweight fingerprint of the prompt. By passing the ID of a previous message, the API compares the new request against the stored fingerprint and returns a reason for any cache miss, such as a change in the model, system prompt, or tools.

This feature is currently available exclusively through the Claude API and the Claude Developer Console. It is not supported on third-party cloud platforms like Amazon Bedrock or Google Cloud Vertex AI. Developers must use the Anthropic API directly and include the required beta header to access these diagnostic insights during the beta period.

Common causes include interpolating dynamic data like timestamps into the system prompt, reordering the tools array, or using non-deterministic JSON serialization for tool schemas. Other reasons include changing the model between turns or editing earlier messages in the conversation history rather than treating the message list as an append-only structure.

No, this feature is eligible for Zero Data Retention and does not store raw prompt text or model outputs. It only retains cryptographic hashes and token-count estimates, known as fingerprints, for a short period. These fingerprints are scoped to your specific organization and workspace and are used solely for comparing consecutive requests.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Anthropic →

Keep reading

Anthropic Launches Claude Prompt Caching Dashboard to Optimize API Costs

Anthropic introduced a dedicated dashboard in the Claude Developer Console to provide visibility into prompt caching performance. This allows developers to track cache hit rates and reduce both API expenses and latency for high-context workloads.

Anthropic Previews Token Usage Breakdown for Claude Code Agentic Workflows

ClaudeMay 21

Anthropic Previews Token Usage Breakdown for Claude Code Agentic Workflows

Anthropic is adding a new usage command to its Claude Code terminal agent to provide granular visibility into token consumption across specific skills and tools. This update shifts agentic development from a black-box experience to a transparent one where developers can profile and optimize their AI spending.

OpenRouter Reveals Real-Time Cache Hit Rates and Effective LLM Pricing by Provider

OpenRouterJun 7

OpenRouter Reveals Real-Time Cache Hit Rates and Effective LLM Pricing by Provider

OpenRouter now displays real-time cache hit rates and historical traffic data on its Pricing tab. This update provides transparency into how different model providers compare on effective pricing for LLMs like Anthropic's Claude Opus 4.8, enabling users to optimize costs.

What is Anthropic Cache Diagnostics?

How does Claude Cache Diagnostics work?

Which platforms support Claude Cache Diagnostics?

What are the common reasons for a Claude prompt cache miss?

Does Cache Diagnostics store the content of my prompts?

Keep reading

Anthropic Launches Claude Prompt Caching Dashboard to Optimize API Costs

Anthropic Launches Claude Prompt Caching Dashboard to Optimize API Costs

Anthropic Previews Token Usage Breakdown for Claude Code Agentic Workflows

Anthropic Previews Token Usage Breakdown for Claude Code Agentic Workflows

OpenRouter Reveals Real-Time Cache Hit Rates and Effective LLM Pricing by Provider

OpenRouter Reveals Real-Time Cache Hit Rates and Effective LLM Pricing by Provider

Keep reading

Anthropic Launches Claude Prompt Caching Dashboard to Optimize API Costs

Anthropic Launches Claude Prompt Caching Dashboard to Optimize API Costs

Anthropic Previews Token Usage Breakdown for Claude Code Agentic Workflows

Anthropic Previews Token Usage Breakdown for Claude Code Agentic Workflows

OpenRouter Reveals Real-Time Cache Hit Rates and Effective LLM Pricing by Provider

OpenRouter Reveals Real-Time Cache Hit Rates and Effective LLM Pricing by Provider