OpenRouter Adds Gemini 3.1 Flash Lite With 1M Context and Service Tiers

OpenRouter

May 7, 2026

OpenRouter integrated Google's Gemini 3.1 Flash Lite into its unified API, offering a 1 million token context window and multimodal processing for $0.25 per million input tokens. The update introduces a service tier parameter that allows developers to trade off latency for lower costs on high-volume agentic tasks.

OpenRouter, a unified API for accessing hundreds of LLMs, launched the general availability of Google's Gemini 3.1 Flash Lite. This high-efficiency model supports a 1 million token context window (the total data a model processes at once) and native multimodal inputs. It follows OpenRouter's latest model aliases.

Context window: 1M tokens
Pricing (input): $0.25 per million tokens
Pricing (output): $1.50 per million tokens
Multimodal inputs: Text, image, video, and more
Service tiers: Standard, flex, and priority
Preview shutdown: May 25, 2026

The release targets the "efficiency frontier" where high-volume agentic loops (autonomous multi-step AI task cycles) require long context and low costs. By supporting Google's flex and priority tiers through a new service_tier parameter, OpenRouter allows users to optimize spend based on urgency. This mirrors Gemini's multimodal dominance.

Access the model via the google/gemini-3.1-flash-lite endpoint at $0.25 per million input and $1.50 per million output tokens. Developers using the preview version must migrate before the May 25, 2026 shutdown. The service_tier parameter is now available for Google and OpenAI models to manage routing.

View the full update on openrouter.ai

OpenRouter

@OpenRouterMay 7

Gemini 3.1 Flash Lite from @GoogleDeepMind is now GA on OpenRouter. Multimodal (text/image/video/audio/PDF → text), 1M context, selectable thinking levels, at $0.25/M in / $1.50/M out. Also works with our new service_tier param for cost/latency tradeoffs!

12351

View on X

Still wondering? A few quick answers below.

Gemini 3.1 Flash Lite is a high-efficiency multimodal model from Google designed for high-volume, cost-sensitive tasks. It features a 1 million token context window and supports selectable thinking levels to control reasoning depth. The model is optimized for low-latency applications and agentic workflows where speed and budget are primary considerations.

On OpenRouter, Gemini 3.1 Flash Lite is priced at $0.25 per million input tokens and $1.50 per million output tokens. This pricing applies to the general availability version of the model. Users can further manage costs by using the service tier parameter to choose between different latency and price tradeoffs for their API requests.

OpenRouter allows developers to pass a service tier parameter to trade off cost against latency. The available tiers are standard, flex, and priority. This feature currently supports both Google and OpenAI models, and the specific tier used to serve the request is included in the API response, helping developers optimize their high-volume agentic workloads.

The preview version of the model, identified as gemini-3.1-flash-lite-preview, is scheduled for deprecation on May 11, 2026. Following this deprecation period, the preview endpoint will officially shut down on May 25, 2026. Developers are encouraged to migrate their applications to the general availability version of the model before these dates to avoid service interruptions.

Gemini 3.1 Flash Lite is a natively multimodal model that can process a wide variety of input types to generate text outputs. Supported formats include text, images, video, audio, and PDF documents. This capability allows the model to handle complex data extraction and analysis tasks across different media formats within its 1 million token context window.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from OpenRouter →

Keep reading

OpenRouter Adds Gemini 3.5 Flash for High Performance Agentic Coding

OpenRouter integrated Google's Gemini 3.5 Flash, a model that outperforms the previous 3.1 Pro version in coding and tool use at a lower price point. With a 1 million token context window and adjustable thinking levels, it provides a cost-effective alternative for complex autonomous workflows.

Google DeepMindMar 3

Google Launches Gemini 3.1 Flash-Lite for High-Volume AI Workloads

Google DeepMind released Gemini 3.1 Flash-Lite, its fastest and most cost-efficient Gemini 3 series model, priced at $0.25/1M input tokens. It outperforms 2.5 Flash with 2.5X faster response times, making high-volume AI tasks significantly cheaper and quicker to run.

Vercel Integrates Gemini 3.5 Flash for Parallel Agentic Workflows

VercelMay 19

Vercel Integrates Gemini 3.5 Flash for Parallel Agentic Workflows

Vercel added Google's Gemini 3.5 Flash to its AI Gateway, enabling developers to deploy the new reasoning model with no price markup or additional provider accounts. The integration prioritizes agentic performance, replacing traditional sampling parameters with simplified thinking levels to optimize multi-turn reasoning.

GoogleApr 2

Google Launches Flex and Priority Tiers to Balance Agentic Workload Costs

Google introduced Flex and Priority inference tiers to the Gemini API, allowing developers to choose between 50% cost savings or maximum reliability. This shift enables teams to optimize expensive thinking tokens for background agents while ensuring user-facing interactions remain instant and dependable.

What is Gemini 3.1 Flash Lite?

What is the pricing for Gemini 3.1 Flash Lite on OpenRouter?

How do OpenRouter service tiers work for Gemini models?

When will the Gemini 3.1 Flash Lite preview be shut down?

What multimodal inputs does Gemini 3.1 Flash Lite support?

Keep reading

OpenRouter Adds Gemini 3.5 Flash for High Performance Agentic Coding

OpenRouter Adds Gemini 3.5 Flash for High Performance Agentic Coding

Google Launches Gemini 3.1 Flash-Lite for High-Volume AI Workloads

Google Launches Gemini 3.1 Flash-Lite for High-Volume AI Workloads

Vercel Integrates Gemini 3.5 Flash for Parallel Agentic Workflows

Vercel Integrates Gemini 3.5 Flash for Parallel Agentic Workflows

Google Launches Flex and Priority Tiers to Balance Agentic Workload Costs

Google Launches Flex and Priority Tiers to Balance Agentic Workload Costs

Keep reading

OpenRouter Adds Gemini 3.5 Flash for High Performance Agentic Coding

OpenRouter Adds Gemini 3.5 Flash for High Performance Agentic Coding

Google Launches Gemini 3.1 Flash-Lite for High-Volume AI Workloads

Google Launches Gemini 3.1 Flash-Lite for High-Volume AI Workloads

Vercel Integrates Gemini 3.5 Flash for Parallel Agentic Workflows

Vercel Integrates Gemini 3.5 Flash for Parallel Agentic Workflows

Google Launches Flex and Priority Tiers to Balance Agentic Workload Costs

Google Launches Flex and Priority Tiers to Balance Agentic Workload Costs