HeadsUpAI

Vercel AI Gateway Automates Provider Selection Based on Cost and Performance

Vercel, a frontend cloud platform and creator of the AI SDK, launched a sort capability for its AI Gateway. This service allows developers to rank the multiple providers hosting a single model using specific metrics like cost (lowest price), ttft (time to first token latency), or tps (tokens per second throughput).

As inference (running a model to generate outputs) becomes commoditized, the same model is often available through dozens of hosts with fluctuating pricing. This update mirrors an industry shift toward automated routing, similar to OpenRouter's Pareto Code or Warp's open-weight model selection, where infrastructure handles the best bang-for-the-buck calculus.

You can implement this by adding a sort parameter to providerOptions.gateway within the Vercel AI SDK. The AI Gateway computes rankings at request time, automatically bypassing degraded providers. This is particularly useful for Vercel's agentic workload index where high token volumes make margin optimization a primary engineering requirement.

Vercel Developers
Vercel Developers
@vercel_dev
X

Sort providers by metric on AI Gateway: โ–ช๏ธŽ ๐šœ๐š˜๐š›๐š: '๐šŒ๐š˜๐šœ๐š' โ†’ cheapest first โ–ช๏ธŽ ๐šœ๐š˜๐š›๐š: '๐š๐š๐š๐š' โ†’ lowest latency first โ–ช๏ธŽ ๐šœ๐š˜๐š›๐š: '๐š๐š™๐šœ' โ†’ highest throughput first https://t.co/BwcAn8SFsy

5retweets51likes
View on X

Still wondering? A few quick answers below.

The sort feature is a programmatic control that allows developers to rank multiple AI providers behind a single model based on specific metrics. Instead of using a static order, the gateway dynamically evaluates providers at request time to ensure traffic routes to the host that best fits your current priorities for cost or performance.

Developers can set a sort parameter within the gateway provider options to rank hosts by cost, time to first token, or tokens per second. The gateway calculates these rankings in real time, meaning it automatically accounts for price changes or performance fluctuations across different providers without requiring any manual code updates or configuration changes from the user.

You can sort providers by three primary metrics: cost, time to first token, and throughput. Sorting by cost prioritizes the lowest price per million tokens, while time to first token focuses on minimizing initial latency. Throughput sorting identifies the provider with the highest tokens per second, which is ideal for long-form content generation.

Yes, the sort feature is compatible with other gateway controls like Zero Data Retention and manual ordering. For example, you can filter for providers that meet specific security compliance standards and then sort that subset by latency. If you specify a manual order, those providers are tried first before the remaining hosts are sorted.

Every API response includes a routing metadata block that details the execution order and the specific metrics used for the decision. This transparency allows you to see exactly why a request landed with a specific provider, which hosts were considered, and if any providers were deprioritized due to health issues or high latency.

Share this update