Introducing Speech Engine. Developers can now turn their existing chat agent into a full voice agent with one prompt. Speech Engine combines our leading speech, transcription, and voice orchestration models into a single pipeline - all custom built to work best together. https://t.co/WSWM7nppwd
ElevenLabs Launches Speech Engine for Plug and Play Voice Agent Upgrades
ElevenLabs· Updated
ElevenLabs released Speech Engine, a unified pipeline that combines transcription, speech synthesis, and conversational orchestration into a single API. The tool allows developers to add a low-latency voice layer to existing text-based agents without rearchitecting their underlying model or retrieval systems.
- Pricing
- 8 cents per minute
- Language support (TTS)
- 70+ languages
- Language support (STT)
- 90+ languages
- Availability
- ElevenAPI (Node.js and Python SDKs)
- Core components
- STT, TTS, Turn Detection, and Interruption Handling
Building reliable voice agents requires stitching separate providers, which introduces latency. This release follows an industry shift toward unified voice stacks, mirroring the Together AI unified voice agent cloud launch and the OpenAI Realtime API launch. By managing the full voice lifecycle, it removes the need for custom orchestration code.
You can integrate the engine using Node.js or Python SDKs to convert chat workflows into voice-first experiences. The system supports over 70 languages and provides pre-built UI components for web and mobile apps. Speech Engine is available now via the ElevenAPI at 8 cents per minute, with a path to the ElevenAgents platform.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →




