1/ Audio is now first-class on OpenRouter. Two new endpoints live today: š¢ /api/v1/audio/speech ā text-to-speech (TTS) š¤ /api/v1/audio/transcriptions ā speech-to-text (SST) Same routing, billing, and keys you already use for text, image, and video. https://t.co/6uHeEUuDl5
OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents
Ā· Updated
OpenRouter introduced dedicated text-to-speech and transcription endpoints that integrate with its existing unified API and billing system. By aggregating audio models from providers like Google and OpenAI, the update allows developers to build voice agents with automatic fallbacks and centralized observability.
- TTS endpoint
- /api/v1/audio/speech
- SST endpoint
- /api/v1/audio/transcriptions
- Providers supported
- OpenAI, Google, Mistral, and others
- API compatibility
- OpenAI Audio Speech API (for TTS)
- Input format
- Base64-encoded audio (for SST)
- Availability
- Live for all OpenRouter users
Building reliable voice agents currently requires managing fragmented SDKs for providers like Google and Groq. This update applies OpenRouter's aggregation model to the audio stack, offering a single interface that handles model routing and automatic fallbacks. It follows the platform's recent OpenRouter Audio Input leaderboard.
You can now integrate these endpoints to swap between audio providers without changing code. The text-to-speech endpoint is compatible with the OpenAI Audio Speech API, while the transcription endpoint accepts base64-encoded audio. Both are live today, providing a consolidated view of audio usage alongside standard metrics.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards ā

