ElevenLabs Launches Dubbing v2 to Carry Original Emotion Across 90 Languages

ElevenLabs

May 28, 2026 · Updated Jun 12, 2026

ElevenLabs released Dubbing v2, a foundational model that preserves a speaker's original tone, emotion, and delivery during translation. By conditioning output on the source audio rather than just a transcript, the system eliminates the flat quality typical of traditional AI localization.

ElevenLabs, an AI platform for voice synthesis and music generation, launched Dubbing v2 to localize content across 90+ languages. This foundational model upgrade arrives two days after the ElevenLabs Music v2 release and shifts the architecture from transcript-based generation to performance-conditioned synthesis.

Language support: 90+ languages
Core technology: Performance-conditioned synthesis
API status: Coming soon
Phrasing logic: Sync-aware translation

Traditional AI dubbing often produces flat, disconnected audio because it relies on text-to-speech from a translated transcript. Dubbing v2 solves this by analyzing the original speaker's intonation, energy, and pacing, ensuring the translated version retains the intended emotional impact. This capability bridges the gap between automated tools and professional studio dubbing.

The model includes sync-aware translation logic to automatically align speech timing with the original video. It is available now in ElevenCreative and through the ElevenProductions managed service, with API access coming soon. New and existing users can access up to 30 minutes of free dubbing during a seven-day introductory period.

View the full update on elevenlabs.io

ElevenLabs

@ElevenLabsMay 28

Introducing Dubbing v2, our revolutionary new dubbing model. For the first time, the emotion and performance of the original content is carried over into every language. https://t.co/EZz6DmlbRW

1931.8k

View on X

Still wondering? A few quick answers below.

Dubbing v2 is an advanced AI model designed to localize video and audio content into more than 90 languages. Unlike previous versions that relied on text transcripts, this model conditions its output directly on the original audio performance. This allows the system to carry over the speaker's original emotion, tone, and delivery into every translated language.

The model preserves emotion by analyzing the original speaker's performance rather than just translating a text transcript. By conditioning the generation on the source audio, it captures nuances like intonation, pacing, and energy. This approach prevents the flat or disconnected sound common in traditional AI dubbing, making the translated speech feel as if the original person actually said it.

Dubbing v2 is available today for creators through the ElevenCreative platform and for enterprise users via ElevenProductions. While it is currently accessible through the web interface, API access is not yet live and is listed as coming soon. For a limited seven-day window, users on various plans can access between one and 30 minutes of free usage.

The new model supports localization across more than 90 different languages. It uses sync-aware translation logic to ensure that the phrasing sounds natural in each target language while automatically aligning the starts and stops of the speech with the original content. This reduces the need for manual editing to fix timing issues in the final localized video.

ElevenCreative is a self-serve platform where creators can localize content like YouTube videos with one click. ElevenProductions is a professional managed service for studios and broadcasters that combines the Dubbing v2 model with human translators, expert voice casting, and professional mixing. Both options utilize the same underlying performance-conditioned model to ensure high-quality, expressive audio delivery across global markets.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from ElevenLabs →

Keep reading

ElevenLabs Launches Music v2 With Inpainting and Dense Lyrical Delivery

ElevenLabs released Music v2, a foundational model upgrade that introduces precise inpainting controls and the ability to handle complex genre transitions mid-track. The update significantly lowers costs for commercial and developer tiers while improving vocal fidelity for fast-paced delivery like rap.

ElevenCreativeMar 18

ElevenLabs Launches Flows to Chain Image Video and Audio Models

ElevenLabs launched Flows, a node-based canvas inside ElevenCreative for chaining 35+ image, video, voice, and music models into reusable creative pipelines. Batch-execute a flow with swapped inputs — different products, avatars, or voices — to produce campaign variants at scale.

GoogleJun 10

Google AI Releases Gemini 3.5 Live Translate for Natural Streaming Speech Translation

Google AI released Gemini 3.5 Live Translate, an audio model for live speech-to-speech translation. It supports over 70 languages, streaming translations continuously to maintain natural conversation flow by preserving speaker intonation and pacing. This aims to eliminate awkward pauses, making cross-language interactions feel more fluid across various applications.

Mistral AIMar 28

Mistral AI Launches Voxtral TTS to Challenge Proprietary Models with Open Weights

Mistral AI launched Voxtral TTS, a 4B-parameter text-to-speech model capable of zero-shot voice cloning from just three seconds of audio. By offering frontier-grade emotional expressiveness and low latency in an open-weight format, it provides a high-performance alternative to closed-source providers for building real-time voice agents.

What is ElevenLabs Dubbing v2?

How does ElevenLabs Dubbing v2 preserve emotion in translations?

Is ElevenLabs Dubbing v2 available to everyone?

How many languages does ElevenLabs Dubbing v2 support?

What is the difference between ElevenCreative and ElevenProductions for dubbing?

Keep reading

ElevenLabs Launches Music v2 With Inpainting and Dense Lyrical Delivery

ElevenLabs Launches Music v2 With Inpainting and Dense Lyrical Delivery

ElevenLabs Launches Flows to Chain Image Video and Audio Models

ElevenLabs Launches Flows to Chain Image Video and Audio Models

Google AI Releases Gemini 3.5 Live Translate for Natural Streaming Speech Translation

Google AI Releases Gemini 3.5 Live Translate for Natural Streaming Speech Translation

Mistral AI Launches Voxtral TTS to Challenge Proprietary Models with Open Weights

Mistral AI Launches Voxtral TTS to Challenge Proprietary Models with Open Weights

Keep reading

ElevenLabs Launches Music v2 With Inpainting and Dense Lyrical Delivery

ElevenLabs Launches Music v2 With Inpainting and Dense Lyrical Delivery

ElevenLabs Launches Flows to Chain Image Video and Audio Models

ElevenLabs Launches Flows to Chain Image Video and Audio Models

Google AI Releases Gemini 3.5 Live Translate for Natural Streaming Speech Translation

Google AI Releases Gemini 3.5 Live Translate for Natural Streaming Speech Translation

Mistral AI Launches Voxtral TTS to Challenge Proprietary Models with Open Weights

Mistral AI Launches Voxtral TTS to Challenge Proprietary Models with Open Weights