Google Gemini CLI Integrates Local Gemma Models for Intelligent Task Routing

Google Gemma

Apr 30, 2026 · Updated May 9, 2026

Gemini CLI v0.40.0 introduces experimental support for running Gemma models locally to handle intelligent routing decisions. By offloading intent analysis to the user's hardware, the agent reduces cloud API dependency and latency for simple tasks. This marks the first step toward a roadmap of full local execution for Google's terminal-based agent.

Google released Gemini CLI v0.40.0, introducing experimental support for running Gemma models on local hardware. The update enables intelligent model routing—using a local model to direct tasks—to analyze intent locally. This follows the recent launch of Gemma 4 frontier reasoning.

This shift addresses the high latency and cost of using cloud-based models for minor agentic decisions. By using a local router, the CLI handles task decomposition and tool selection instantly without API fees. It mirrors an industry move toward hybrid architectures that balance local privacy with cloud-scale intelligence.

You can now use the gemini gemma command to set up local model integration. While currently limited to routing decisions, the roadmap includes full local execution for agentic tasks. The update is available now as an open-source tool, providing a private alternative to cloud-only coding assistants.

View the full update on geminicli.com

Google Gemma

@googlegemmaApr 30

Now you can use Gemma directly in the Gemini CLI! 🚀 v0.40.0 introduces experimental support for local Gemma models, starting with intelligent model routing (with full local execution on the roadmap!). https://t.co/8tuZSNgaDP

30334

View on X

Still wondering? A few quick answers below.

Gemini CLI is an open-source terminal-based AI agent developed by Google. It allows developers to interact with Gemini models directly from their command line to perform tasks like navigating codebases, running terminal commands, and executing multi-step agentic workflows. It serves as a developer-focused interface for Google's frontier AI models.

Local model routing uses a locally running Gemma model to analyze a user's intent before sending a request to the cloud. The local model decides how to route the task, which reduces latency and API costs for simple decisions. This hybrid approach keeps the control logic on the user's hardware while reserving cloud compute for complex reasoning.

Yes, Gemini CLI is an open-source project published by Google. Unlike some proprietary AI coding assistants, its source code is publicly available, allowing developers to inspect, modify, and extend the tool. This open nature has led to community interest in forking the project to support various local and third-party models beyond Google's ecosystem.

To set up local Gemma models in Gemini CLI v0.40.0, you can use the new gemini gemma command. This streamlined setup process is designed to integrate locally running models into the CLI's workflow. Once configured, the CLI can use these local models for experimental features like intelligent routing instead of relying entirely on cloud-based inference.

Currently, Gemini CLI v0.40.0 only supports local Gemma models for intelligent routing decisions. However, the official roadmap includes plans for full local execution. This future capability would allow the agent to complete entire tasks and execute code directly on the user's machine without needing to connect to external cloud APIs for any part of the process.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Google →

Keep reading

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google released Gemma 4, a new family of open models built on the same architecture as Gemini 3 and licensed under Apache 2.0. These models deliver high-performance reasoning and native multimodal capabilities directly on consumer hardware, enabling private, offline agentic workflows. This shift allows developers to build sophisticated AI applications that run entirely on-device without sacrificing intelligence.

Google GemmaMay 29

Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Google released the Google AI Edge Gallery app and LiteRT-LM framework to enable fully offline agentic workflows on mobile and IoT devices. By running Gemma 4 locally, developers can build multi-step agents that plan, use tools, and process multimodal data without cloud latency or privacy risks.

Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

OllamaJun 7

Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

Ollama has made Google DeepMind's Gemma 4 12B model available for local execution, including support for chat and agentic applications. This expands access to a powerful, open-weight multimodal model optimized for on-device reasoning and coding, enabling private and offline AI workflows on consumer hardware.

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google DeepMindMar 28

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google DeepMind released Gemini 3.1 Flash Live, a low-latency audio model optimized for real-time dialogue and complex task execution. The model improves function calling and tonal recognition, allowing voice agents to handle multi-step workflows and emotional nuances more reliably. This enables more fluid interactions in noisy environments without losing conversational context.

What is Gemini CLI?

How does local model routing work in Gemini CLI?

Is Gemini CLI open source?

How do I set up local Gemma models in Gemini CLI?

Does Gemini CLI support full local execution?

Keep reading

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Keep reading

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents