HeadsUpAI

ElevenLabs Launches Dubbing v2 to Carry Original Emotion Across 90 Languages

ElevenLabs, an AI platform for voice synthesis and music generation, launched Dubbing v2 to localize content across 90+ languages. This foundational model upgrade arrives two days after the ElevenLabs Music v2 release and shifts the architecture from transcript-based generation to performance-conditioned synthesis.
Language support
90+ languages
Core technology
Performance-conditioned synthesis
API status
Coming soon
Phrasing logic
Sync-aware translation

Traditional AI dubbing often produces flat, disconnected audio because it relies on text-to-speech from a translated transcript. Dubbing v2 solves this by analyzing the original speaker's intonation, energy, and pacing, ensuring the translated version retains the intended emotional impact. This capability bridges the gap between automated tools and professional studio dubbing.

The model includes sync-aware translation logic to automatically align speech timing with the original video. It is available now in ElevenCreative and through the ElevenProductions managed service, with API access coming soon. New and existing users can access up to 30 minutes of free dubbing during a seven-day introductory period.

ElevenLabs
ElevenLabs
@ElevenLabs
X

Introducing Dubbing v2, our revolutionary new dubbing model. For the first time, the emotion and performance of the original content is carried over into every language. https://t.co/EZz6DmlbRW

87retweets962likes
View on X

Still wondering? A few quick answers below.

Dubbing v2 is an advanced AI model designed to localize video and audio content into more than 90 languages. Unlike previous versions that relied on text transcripts, this model conditions its output directly on the original audio performance. This allows the system to carry over the speaker's original emotion, tone, and delivery into every translated language.

The model preserves emotion by analyzing the original speaker's performance rather than just translating a text transcript. By conditioning the generation on the source audio, it captures nuances like intonation, pacing, and energy. This approach prevents the flat or disconnected sound common in traditional AI dubbing, making the translated speech feel as if the original person actually said it.

Dubbing v2 is available today for creators through the ElevenCreative platform and for enterprise users via ElevenProductions. While it is currently accessible through the web interface, API access is not yet live and is listed as coming soon. For a limited seven-day window, users on various plans can access between one and 30 minutes of free usage.

The new model supports localization across more than 90 different languages. It uses sync-aware translation logic to ensure that the phrasing sounds natural in each target language while automatically aligning the starts and stops of the speech with the original content. This reduces the need for manual editing to fix timing issues in the final localized video.

ElevenCreative is a self-serve platform where creators can localize content like YouTube videos with one click. ElevenProductions is a professional managed service for studios and broadcasters that combines the Dubbing v2 model with human translators, expert voice casting, and professional mixing. Both options utilize the same underlying performance-conditioned model to ensure high-quality, expressive audio delivery across global markets.

Share this update