Google Launches Gemini 3.1 Flash-Lite for High-Volume AI Workloads

Google DeepMindGoogle DeepMind

· Updated

Google DeepMind released Gemini 3.1 Flash-Lite, its fastest and most cost-efficient Gemini 3 series model, priced at $0.25/1M input tokens. It outperforms 2.5 Flash with 2.5X faster response times, making high-volume AI tasks significantly cheaper and quicker to run.

Gemini 3.1 Flash-Lite is Google DeepMind's most cost-efficient Gemini 3 model, priced at $0.25/1M input tokens and $1.50/1M output tokens. It delivers a 2.5X faster Time to First Answer Token and 45% higher output speed than gemini-2.5-flash, per Artificial Analysis, an independent AI benchmarking and analysis company. It scores 86.9% on GPQA Diamond and 76.8% on MMMU Pro, outperforming larger prior-generation Gemini models. A new "thinking levels" feature lets developers dial in how deeply the model reasons — from high-volume translation and content moderation to UI generation and multi-step simulations.

This brings affordable, fast reasoning to developer teams running inference at scale. The performance-per-dollar improvement gives product teams a credible option for latency-sensitive pipelines without the cost of larger models.

Build via the Gemini API in Google AI Studio using gemini-3.1-flash-lite-preview, or through Vertex AI for enterprise deployments. Use thinking levels to tune cost and reasoning depth per workload.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update