Gemini 3.1 Flash Lite from @GoogleDeepMind is now GA on OpenRouter. Multimodal (text/image/video/audio/PDF → text), 1M context, selectable thinking levels, at $0.25/M in / $1.50/M out. Also works with our new service_tier param for cost/latency tradeoffs!
OpenRouter Adds Gemini 3.1 Flash Lite With 1M Context and Service Tiers
OpenRouter integrated Google's Gemini 3.1 Flash Lite into its unified API, offering a 1 million token context window and multimodal processing for $0.25 per million input tokens. The update introduces a service tier parameter that allows developers to trade off latency for lower costs on high-volume agentic tasks.
- Context window
- 1M tokens
- Pricing (input)
- $0.25 per million tokens
- Pricing (output)
- $1.50 per million tokens
- Multimodal inputs
- Text, image, video, and more
- Service tiers
- Standard, flex, and priority
- Preview shutdown
- May 25, 2026
The release targets the "efficiency frontier" where high-volume agentic loops (autonomous multi-step AI task cycles) require long context and low costs. By supporting Google's flex and priority tiers through a new service_tier parameter, OpenRouter allows users to optimize spend based on urgency. This mirrors Gemini's multimodal dominance.
Access the model via the google/gemini-3.1-flash-lite endpoint at $0.25 per million input and $1.50 per million output tokens. Developers using the preview version must migrate before the May 25, 2026 shutdown. The service_tier parameter is now available for Google and OpenAI models to manage routing.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →




