Google AI Releases Gemini 3.5 Live Translate for Natural Streaming Speech Translation

GoogleGoogle

Google AI released Gemini 3.5 Live Translate, an audio model for live speech-to-speech translation. It supports over 70 languages, streaming translations continuously to maintain natural conversation flow by preserving speaker intonation and pacing. This aims to eliminate awkward pauses, making cross-language interactions feel more fluid across various applications.

Google AI launched Gemini 3.5 Live Translate, an audio model for live speech-to-speech translation across more than 70 languages. This model streams translations as a speaker talks, balancing speed and quality to stay mere seconds behind the conversation. It automatically detects languages, handles multilingual inputs, and filters ambient noise.
Supported Languages
70+
Latency
Mere seconds behind speaker
Audio Preservation
Pacing, pitch, intonation
Developer Access
Gemini Live API, Google AI Studio (public preview)
Consumer Access
Google Translate app (Android/iOS)
Enterprise Access
Google Meet (private preview)

The model aims to make cross-language communication feel natural by preserving the speaker's intonation, pacing, and pitch, unlike traditional turn-by-turn systems that introduce pauses. This continuous, low-latency approach seeks to foster more fluid and human-like interactions.

Gemini 3.5 Live Translate is available in public preview for developers via the Gemini Live API and Google AI Studio. You can also experience it in the Google Translate app on Android and iOS, with a new Android-exclusive "listening mode" for private earpiece translations. It will also roll out to Google Meet for select business customers.

Google AI
Google AI
@GoogleAI
X

Today, we released Gemini 3.5 Live Translate, our latest audio model for live speech-to-speech translation. It supports over 70 languages and starts translating as soon as you start talking, streaming translations while listening to what you say next. No awkward pauses or choppy audio, just real connection without language barriers. So, how does it work? 🤔 The model is able to make split-second decisions to juggle speed and translation quality so conversations actually feel fluid, human, and natural. In order to do this, the model must receive and contextualize the input while simultaneously outputting the translated speech. Through this process, Gemini 3.5 Live Translate manages to stay mere seconds behind each speaker and can even maintain pacing, pitch, and intonation across extended sessions. See it in action below, or try it yourself in the Google Translate app on iOS & Android.

274retweets2.1klikes
View on X

Still wondering? A few quick answers below.

It is Google AI's latest audio model for live speech-to-speech translation, designed to provide fluid, natural conversations across more than 70 languages by streaming translations in near real-time.

The model processes speech as it streams, making split-second decisions to balance translation speed and quality. It continuously outputs translated speech while listening to the next input, staying just seconds behind the speaker.

It is available for developers via the Gemini Live API and Google AI Studio in public preview. Consumers can use it in the Google Translate app on Android and iOS, and it will soon be in Google Meet for select business customers.

In the Google Translate app, it offers a new "listening mode" for Android users, allowing translations to be heard directly through the phone's earpiece for private listening without headphones.

Yes, the model is designed to maintain the speaker's pacing, pitch, and intonation across extended sessions, aiming for a more natural and human-like translated output.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update