ElevenLabs Launches Speech Engine for Plug and Play Voice Agent Upgrades

ElevenLabs

May 20, 2026 · Updated Jun 12, 2026

ElevenLabs released Speech Engine, a unified pipeline that combines transcription, speech synthesis, and conversational orchestration into a single API. The tool allows developers to add a low-latency voice layer to existing text-based agents without rearchitecting their underlying model or retrieval systems.

ElevenLabs, an AI voice platform, launched Speech Engine to provide a unified audio orchestration layer for existing applications. The service bundles speech-to-text, text-to-speech, and turn-taking logic into a single WebSocket-based pipeline. It sits on top of any text-based agent, handling verbal interaction while leaving core logic untouched.

Pricing: 8 cents per minute
Language support (TTS): 70+ languages
Language support (STT): 90+ languages
Availability: ElevenAPI (Node.js and Python SDKs)
Core components: STT, TTS, Turn Detection, and Interruption Handling

Building reliable voice agents requires stitching separate providers, which introduces latency. This release follows an industry shift toward unified voice stacks, mirroring the Together AI unified voice agent cloud launch and the OpenAI Realtime API launch. By managing the full voice lifecycle, it removes the need for custom orchestration code.

You can integrate the engine using Node.js or Python SDKs to convert chat workflows into voice-first experiences. The system supports over 70 languages and provides pre-built UI components for web and mobile apps. Speech Engine is available now via the ElevenAPI at 8 cents per minute, with a path to the ElevenAgents platform.

View the full update on elevenlabs.io

ElevenLabs

@ElevenLabsMay 20

Introducing Speech Engine. Developers can now turn their existing chat agent into a full voice agent with one prompt. Speech Engine combines our leading speech, transcription, and voice orchestration models into a single pipeline - all custom built to work best together. https://t.co/WSWM7nppwd

View on X

Still wondering? A few quick answers below.

Speech Engine is a unified voice orchestration pipeline that allows developers to add a conversational audio layer to existing text-based agents. It combines transcription, speech synthesis, and turn-taking logic into a single integration, handling the complexities of real-time verbal communication while allowing the underlying model and business logic to remain unchanged.

ElevenAgents is a fully-managed platform where ElevenLabs provides the language model, knowledge base, and tools in an all-in-one solution. In contrast, Speech Engine is designed for developers who want to bring their own language model and maintain full control over their conversation logic and server architecture while using ElevenLabs for the voice layer.

Speech Engine is available through the ElevenAPI with a pricing model that starts at 8 cents per minute. This cost decreases as usage scales, making it a flexible option for developers who need to manage their own infrastructure while adding high-fidelity voice capabilities to their applications without a total re-architecture.

Speech Engine supports any language model that produces text. The developer kit includes built-in stream extraction for major providers like OpenAI, Anthropic, and Google Gemini. For other models, developers can pass plain strings or an asynchronous iterable of string chunks to the engine to generate the corresponding human-like voice responses.

Yes, Speech Engine includes dedicated models for interruption handling and turn detection. It monitors for user speech while the agent is talking and can instantly stop audio playback and loop back when a user cuts in. This removes the need for developers to write custom logic to manage overlapping speech or background noise.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from ElevenLabs →

Keep reading

ElevenLabs Launches 50 Agent Templates to Automate Business Workflows

ElevenLabs released a library of 50+ pre-configured templates for ElevenAgents, covering sales, support, and internal operations. These turnkey blueprints allow teams to deploy voice-first agents with predefined prompts and integrations rather than building agent logic from scratch.

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

OpenRouterMay 7

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

OpenRouter introduced dedicated text-to-speech and transcription endpoints that integrate with its existing unified API and billing system. By aggregating audio models from providers like Google and OpenAI, the update allows developers to build voice agents with automatic fallbacks and centralized observability.

Together AI Launches Unified Voice Agent Cloud With Full Pipeline Co-Location

Together AIMar 18

Together AI Launches Unified Voice Agent Cloud With Full Pipeline Co-Location

Together AI launched a unified platform for real-time voice agents with STT, LLM, and TTS co-located on one cloud. Most voice stacks route audio across separate vendors — Together keeps all three in the same cluster, hitting latency under 700ms.

ElevenCreativeMar 18

ElevenLabs Launches Flows to Chain Image Video and Audio Models

ElevenLabs launched Flows, a node-based canvas inside ElevenCreative for chaining 35+ image, video, voice, and music models into reusable creative pipelines. Batch-execute a flow with swapped inputs — different products, avatars, or voices — to produce campaign variants at scale.

What is ElevenLabs Speech Engine?

How is Speech Engine different from ElevenAgents?

What is the pricing for ElevenLabs Speech Engine?

Which language models are supported by Speech Engine?

Does Speech Engine handle user interruptions during a conversation?

Keep reading

ElevenLabs Launches 50 Agent Templates to Automate Business Workflows

ElevenLabs Launches 50 Agent Templates to Automate Business Workflows

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

Together AI Launches Unified Voice Agent Cloud With Full Pipeline Co-Location

Together AI Launches Unified Voice Agent Cloud With Full Pipeline Co-Location

ElevenLabs Launches Flows to Chain Image Video and Audio Models

ElevenLabs Launches Flows to Chain Image Video and Audio Models

Keep reading

ElevenLabs Launches 50 Agent Templates to Automate Business Workflows

ElevenLabs Launches 50 Agent Templates to Automate Business Workflows

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

Together AI Launches Unified Voice Agent Cloud With Full Pipeline Co-Location

Together AI Launches Unified Voice Agent Cloud With Full Pipeline Co-Location

ElevenLabs Launches Flows to Chain Image Video and Audio Models

ElevenLabs Launches Flows to Chain Image Video and Audio Models