Gemini 3.1 Flash Lite from @GoogleDeepMind is now GA on OpenRouter. Multimodal (text/image/video/audio/PDF → text), 1M context, selectable thinking levels, at $0.25/M in / $1.50/M out. Also works with our new service_tier param for cost/latency tradeoffs!
OpenRouter Adds Gemini 3.1 Flash Lite With 1M Context and Service Tiers
OpenRouter, a unified API for accessing hundreds of LLMs, launched the general availability of Google's Gemini 3.1 Flash Lite. This high-efficiency model supports a 1 million token context window (the total data a model processes at once) and native multimodal inputs. It follows OpenRouter's latest model aliases.
- Context window
- 1M tokens
- Pricing (input)
- $0.25 per million tokens
- Pricing (output)
- $1.50 per million tokens
- Multimodal inputs
- Text, image, video, and more
- Service tiers
- Standard, flex, and priority
- Preview shutdown
- May 25, 2026
The release targets the "efficiency frontier" where high-volume agentic loops (autonomous multi-step AI task cycles) require long context and low costs. By supporting Google's flex and priority tiers through a new service_tier parameter, OpenRouter allows users to optimize spend based on urgency. This mirrors Gemini's multimodal dominance.
Access the model via the google/gemini-3.1-flash-lite endpoint at $0.25 per million input and $1.50 per million output tokens. Developers using the preview version must migrate before the May 25, 2026 shutdown. The service_tier parameter is now available for Google and OpenAI models to manage routing.
OpenRouter
@OpenRouter
12retweets351likes
View on XStill wondering? A few quick answers below.
Gemini 3.1 Flash Lite is a high-efficiency multimodal model from Google designed for high-volume, cost-sensitive tasks. It features a 1 million token context window and supports selectable thinking levels to control reasoning depth. The model is optimized for low-latency applications and agentic workflows where speed and budget are primary considerations.
On OpenRouter, Gemini 3.1 Flash Lite is priced at $0.25 per million input tokens and $1.50 per million output tokens. This pricing applies to the general availability version of the model. Users can further manage costs by using the service tier parameter to choose between different latency and price tradeoffs for their API requests.
OpenRouter allows developers to pass a service tier parameter to trade off cost against latency. The available tiers are standard, flex, and priority. This feature currently supports both Google and OpenAI models, and the specific tier used to serve the request is included in the API response, helping developers optimize their high-volume agentic workloads.
The preview version of the model, identified as gemini-3.1-flash-lite-preview, is scheduled for deprecation on May 11, 2026. Following this deprecation period, the preview endpoint will officially shut down on May 25, 2026. Developers are encouraged to migrate their applications to the general availability version of the model before these dates to avoid service interruptions.
Gemini 3.1 Flash Lite is a natively multimodal model that can process a wide variety of input types to generate text outputs. Supported formats include text, images, video, audio, and PDF documents. This capability allows the model to handle complex data extraction and analysis tasks across different media formats within its 1 million token context window.


