HeadsUpAI

Google Launches Gemini 3.1 Flash-Lite for High-Volume AI Workloads

· Updated

Gemini 3.1 Flash-Lite is Google DeepMind's most cost-efficient Gemini 3 model, priced at $0.25/1M input tokens and $1.50/1M output tokens. It delivers a 2.5X faster Time to First Answer Token and 45% higher output speed than gemini-2.5-flash, per Artificial Analysis, an independent AI benchmarking and analysis company. It scores 86.9% on GPQA Diamond and 76.8% on MMMU Pro, outperforming larger prior-generation Gemini models. A new "thinking levels" feature lets developers dial in how deeply the model reasons — from high-volume translation and content moderation to UI generation and multi-step simulations.

This brings affordable, fast reasoning to developer teams running inference at scale. The performance-per-dollar improvement gives product teams a credible option for latency-sensitive pipelines without the cost of larger models.

Build via the Gemini API in Google AI Studio using gemini-3.1-flash-lite-preview, or through Vertex AI for enterprise deployments. Use thinking levels to tune cost and reasoning depth per workload.

Share this update