HeadsUpAI

Vercel Index Reveals Agentic Workloads Drive Majority of Production AI Traffic

Vercel, the frontend cloud platform and AI SDK creator, released its AI Gateway production index based on traffic from 200,000 teams. Agentic AI now carries 58.9 percent of all token volume. This shift extends Vercel Workflows, which helped double agentic traffic since late 2025.
Agentic token volume
58.9%
High-volume team fleet size
35 models (average)
Anthropic spend share
61%
Google token volume share
38%
Request fallback rate
3.5%
B2B vs B2C token cost
2x higher (average)

The report shows labs winning specific layers of the same application. Anthropic captures 61 percent of spend by handling high-stakes reasoning, which follows the launch of Claude Opus 4.7 high-speed tier. This diversification builds on Google Gemma 4 integration and adds to GPT-5.5 support to drive multi-model adoption.

Design for a multi-model fleet to optimize workloads. High-volume teams use an average of 35 models to manage the cost of being wrong, paying more for accuracy in B2B contexts. You can implement automated fallbacks to protect uptime, as 3.5 percent of requests rely on these rescues.

Still wondering? A few quick answers below.

The Vercel AI Gateway production index is a report based on seven months of production traffic data from over 200,000 unique teams. It analyzes how tens of trillions of tokens are routed, spent, and consumed across hundreds of models. Unlike benchmarks, it provides a real-world view of model adoption and usage patterns in live applications.

Agentic workloads, which involve AI models using tools or calling APIs to complete tasks, now account for 58.9 percent of all token volume. This is a significant increase from 31.6 percent just six months ago. These requests are roughly 2.6 times more token-heavy than standard chat interactions because they often involve multi-step chains.

Anthropic leads the market in spend with a 61 percent share, as teams use its high-reasoning models for quality-critical tasks. Google leads in token volume with a 38 percent share, driven by the adoption of Gemini Flash for low-cost, high-frequency workloads. This shows that different providers are winning different layers of the same applications.

While smaller teams use an average of three models, high-volume organizations with over 10 million requests use an average of 35 distinct models. These teams treat models as swappable components in a routing graph, using different models for specific tasks like intent detection, reasoning, and summarization to optimize for cost and performance across their entire application.

Approximately 3.5 percent of requests on Vercel's AI Gateway rely on automated fallbacks to complete successfully. These fallbacks trigger when an initial request hits an error, rate limit, or timeout. Failures are more common in expensive, high-reasoning calls, making fallbacks essential for maintaining uptime in complex agentic workflows and large-scale production environments.

Share this update