OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

OpenRouter

May 7, 2026 · Updated May 15, 2026

OpenRouter introduced dedicated text-to-speech and transcription endpoints that integrate with its existing unified API and billing system. By aggregating audio models from providers like Google and OpenAI, the update allows developers to build voice agents with automatic fallbacks and centralized observability.

OpenRouter, a unified API platform for accessing hundreds of language models, launched two dedicated audio endpoints for text-to-speech and transcription. These endpoints allow developers to process audio using the same API keys and billing infrastructure they already use for text and OpenRouter's video generation API.

TTS endpoint: /api/v1/audio/speech
SST endpoint: /api/v1/audio/transcriptions
Providers supported: OpenAI, Google, Mistral, and others
API compatibility: OpenAI Audio Speech API (for TTS)
Input format: Base64-encoded audio (for SST)
Availability: Live for all OpenRouter users

Building reliable voice agents currently requires managing fragmented SDKs for providers like Google and Groq. This update applies OpenRouter's aggregation model to the audio stack, offering a single interface that handles model routing and automatic fallbacks. It follows the platform's recent OpenRouter Audio Input leaderboard.

You can now integrate these endpoints to swap between audio providers without changing code. The text-to-speech endpoint is compatible with the OpenAI Audio Speech API, while the transcription endpoint accepts base64-encoded audio. Both are live today, providing a consolidated view of audio usage alongside standard metrics.

View the full update on openrouter.ai

OpenRouter

@OpenRouterMay 7

1/ Audio is now first-class on OpenRouter. Two new endpoints live today: 📢 /api/v1/audio/speech — text-to-speech (TTS) 🎤 /api/v1/audio/transcriptions — speech-to-text (SST) Same routing, billing, and keys you already use for text, image, and video. https://t.co/6uHeEUuDl5

26342

View on X

Still wondering? A few quick answers below.

OpenRouter has introduced two dedicated endpoints for audio processing. The /api/v1/audio/speech endpoint handles text-to-speech tasks, while the /api/v1/audio/transcriptions endpoint is used for speech-to-text or transcription services. These allow developers to integrate audio capabilities into their applications using the same unified API structure they already use for text and video models.

These endpoints function as a unified interface for multiple audio providers. The text-to-speech endpoint is designed to be compatible with the OpenAI Audio Speech API, making it easier for developers to switch providers. For transcriptions, the system accepts base64-encoded audio files and returns a JSON response containing the transcribed text, providing a standardized way to handle speech data.

OpenRouter provides access to a diversifying supply of audio models from major providers including OpenAI, Google, Mistral, and Groq. By using a single API, developers can access these different models without integrating separate SDKs for each vendor. This setup allows for automatic fallbacks and easier observability across the various speech-to-text and text-to-speech services available on the platform.

Audio services on OpenRouter are integrated into the platform's existing unified billing system. This means developers use the same API keys and single bill for audio tasks that they use for text, image, and video generation. This consolidation simplifies financial management for teams building multimodal applications that require multiple types of AI model interactions across different providers.

Text-to-speech, or TTS, uses the /api/v1/audio/speech endpoint to convert written text into spoken audio output. Speech-to-text, or SST, uses the /api/v1/audio/transcriptions endpoint to convert audio recordings into written text. Both are treated as first-class features on the platform, meaning they receive the same level of routing, observability, and infrastructure support as standard language models.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from OpenRouter →

Keep reading

OpenRouter Adds xAI Creative Stack for Unified Video and Voice Generation

OpenRouter integrated xAI's multimodal suite, enabling developers to generate photorealistic images, short video clips, and natural speech through a single API. The update allows for complex creative workflows that combine xAI's generative models with existing reasoning and coding tools on the platform.

ElevenLabsMay 20

ElevenLabs Launches Speech Engine for Plug and Play Voice Agent Upgrades

ElevenLabs released Speech Engine, a unified pipeline that combines transcription, speech synthesis, and conversational orchestration into a single API. The tool allows developers to add a low-latency voice layer to existing text-based agents without rearchitecting their underlying model or retrieval systems.

WarpMay 26

Warp Terminal Integrates OpenRouter to Provide Direct Access to Multi-Model Inference

Warp added native support for OpenRouter, allowing developers to connect their own API keys to access a wide range of frontier and open-weight models directly from the terminal. This integration enables users to bypass platform credits by routing agent requests through their personal OpenRouter accounts for better pricing and uptime.

What are the new OpenRouter audio endpoints?

How do the OpenRouter audio APIs work?

Which models are supported by OpenRouter audio?

How is billing handled for OpenRouter audio services?

What is the difference between OpenRouter TTS and SST?

Keep reading

OpenRouter Adds xAI Creative Stack for Unified Video and Voice Generation

OpenRouter Adds xAI Creative Stack for Unified Video and Voice Generation

ElevenLabs Launches Speech Engine for Plug and Play Voice Agent Upgrades

ElevenLabs Launches Speech Engine for Plug and Play Voice Agent Upgrades

Warp Terminal Integrates OpenRouter to Provide Direct Access to Multi-Model Inference

Warp Terminal Integrates OpenRouter to Provide Direct Access to Multi-Model Inference

Keep reading

OpenRouter Adds xAI Creative Stack for Unified Video and Voice Generation

OpenRouter Adds xAI Creative Stack for Unified Video and Voice Generation

ElevenLabs Launches Speech Engine for Plug and Play Voice Agent Upgrades

ElevenLabs Launches Speech Engine for Plug and Play Voice Agent Upgrades

Warp Terminal Integrates OpenRouter to Provide Direct Access to Multi-Model Inference

Warp Terminal Integrates OpenRouter to Provide Direct Access to Multi-Model Inference