Google Launches Gemini 3.1 Flash-Lite for High-Volume AI Workloads

Google DeepMind

Mar 3, 2026 · Updated Apr 25, 2026

Google DeepMind released Gemini 3.1 Flash-Lite, its fastest and most cost-efficient Gemini 3 series model, priced at $0.25/1M input tokens. It outperforms 2.5 Flash with 2.5X faster response times, making high-volume AI tasks significantly cheaper and quicker to run.

Gemini 3.1 Flash-Lite is Google DeepMind's most cost-efficient Gemini 3 model, priced at $0.25/1M input tokens and $1.50/1M output tokens. It delivers a 2.5X faster Time to First Answer Token and 45% higher output speed than gemini-2.5-flash, per Artificial Analysis, an independent AI benchmarking and analysis company. It scores 86.9% on GPQA Diamond and 76.8% on MMMU Pro, outperforming larger prior-generation Gemini models. A new "thinking levels" feature lets developers dial in how deeply the model reasons — from high-volume translation and content moderation to UI generation and multi-step simulations.

This brings affordable, fast reasoning to developer teams running inference at scale. The performance-per-dollar improvement gives product teams a credible option for latency-sensitive pipelines without the cost of larger models.

Build via the Gemini API in Google AI Studio using gemini-3.1-flash-lite-preview, or through Vertex AI for enterprise deployments. Use thinking levels to tune cost and reasoning depth per workload.

View the full update on blog.google

Google DeepMind

@GoogleDeepMindMar 3

Gemini 3.1 Flash-Lite has landed. It’s our most cost-efficient Gemini 3 series model yet, built for intelligence at scale. Here’s what’s new 🧵

878

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Google →

Keep reading

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google DeepMind released Gemini 3.1 Flash Live, a low-latency audio model optimized for real-time dialogue and complex task execution. The model improves function calling and tonal recognition, allowing voice agents to handle multi-step workflows and emotional nuances more reliably. This enables more fluid interactions in noisy environments without losing conversational context.

Google Launches Gemini 3.5 Flash with Frontier Performance for Agentic Coding

GoogleMay 21

Google Launches Gemini 3.5 Flash with Frontier Performance for Agentic Coding

Google moved Gemini 3.5 Flash to general availability, positioning it as the strongest agentic and coding model in the Gemini family. The release delivers frontier-level performance at 4x the speed of comparable competitor models, though pricing has risen 3x versus the previous Gemini 3 Flash.

Google Gemini 3.1 Flash Live Claims Top Spot for Production Voice Agents

Google AI StudioApr 13

Google Gemini 3.1 Flash Live Claims Top Spot for Production Voice Agents

Google's Gemini 3.1 Flash Live model reached the #1 position on the Tau Voice Bench leaderboard for real-time voice agents. The update delivers significantly lower latency and higher precision, signaling that multimodal voice AI is now reliable enough for production-grade applications.

OpenRouter Adds Gemini 3.1 Flash Lite With 1M Context and Service Tiers

OpenRouterMay 7

OpenRouter Adds Gemini 3.1 Flash Lite With 1M Context and Service Tiers

OpenRouter integrated Google's Gemini 3.1 Flash Lite into its unified API, offering a 1 million token context window and multimodal processing for $0.25 per million input tokens. The update introduces a service tier parameter that allows developers to trade off latency for lower costs on high-volume agentic tasks.