Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google DeepMind

Mar 28, 2026 · Updated Apr 25, 2026

Google DeepMind released Gemini 3.1 Flash Live, a low-latency audio model optimized for real-time dialogue and complex task execution. The model improves function calling and tonal recognition, allowing voice agents to handle multi-step workflows and emotional nuances more reliably. This enables more fluid interactions in noisy environments without losing conversational context.

Google DeepMind introduced Gemini 3.1 Flash Live, a specialized model for real-time audio reasoning. It features a 90.8% score on the ComplexFuncBench Audio benchmark, indicating a major leap in multi-step function calling. The model also includes improved tonal understanding to detect pitch, pace, and user frustration during live interactions.

Most voice interfaces struggle with interruptions, background noise, and long-term context. This update doubles the conversation thread length, allowing the AI to maintain a train of thought during extended brainstorms. By processing audio natively rather than converting to text first, it reduces latency and captures acoustic nuances that text-only models miss.

You can access the model in preview via the Gemini Live API in Google AI Studio to build low-latency voice agents. It is also integrated into Gemini Enterprise for Customer Experience for automated support. All generated audio is protected by SynthID watermarking to help identify AI-generated content across 200 countries.

View the full update on blog.google

Google DeepMind

@GoogleDeepMindMar 26

Say hello to Gemini 3.1 Flash Live. 🗣️ Our latest audio model delivers more natural conversations with improved function calling – making it more useful and informed. Here’s what’s new 🧵

183

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Google →

Keep reading

Google Gemini 3.1 Flash Live Claims Top Spot for Production Voice Agents

Google's Gemini 3.1 Flash Live model reached the #1 position on the Tau Voice Bench leaderboard for real-time voice agents. The update delivers significantly lower latency and higher precision, signaling that multimodal voice AI is now reliable enough for production-grade applications.

Google Launches Gemini 3.1 Flash TTS With Natural Language Audio Tags

Google DeepMindApr 23

Google Launches Gemini 3.1 Flash TTS With Natural Language Audio Tags

Google released Gemini 3.1 Flash TTS, a text-to-speech model that uses natural language audio tags to control vocal style, pace, and delivery. This update allows users to direct AI speech like a human performance while maintaining low costs and high speed.

GoogleApr 24

Google Gemini 3.1 Flash TTS Becomes Flagship for Expressive Speech

Google designated Gemini 3.1 Flash TTS as its most expressive speech generation model to date. The model uses natural language audio tags to allow developers to direct emotional delivery and vocal character within generated audio.

OpenAIApr 28

OpenAI Realtime API Gets gpt-realtime-1.5 for Stronger Voice Agents

OpenAI released gpt-realtime-1.5 for the Realtime API with stronger instruction following, tool calling, and multilingual transcription. Internal evals show a 5% reasoning lift and 10% better alphanumeric accuracy, directly addressing the reliability gaps that held earlier voice agent deployments back.