Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

Ollama

Jun 7, 2026 · Updated Jun 20, 2026

Ollama has made Google DeepMind's Gemma 4 12B model available for local execution, including support for chat and agentic applications. This expands access to a powerful, open-weight multimodal model optimized for on-device reasoning and coding, enabling private and offline AI workflows on consumer hardware.

Ollama has added Google DeepMind's Gemma 4 12B model to its library, enabling local execution of the multimodal model. The integration supports chat and agentic tools like Hermes Agent and Claude Code. The gemma4:12b-mlx variant was initially available via MLX, with the gemma4:12b model now accessible across all platforms.

Local Size: 7.6GB
Variants: gemma4:12b, gemma4:12b-mlx
Agentic Apps: Hermes Agent, Claude Code, and more
Access: ollama run (chat), ollama launch (agents)

The Gemma 4 family of models, built by Google DeepMind, is designed for frontier-level performance in reasoning, agentic workflows, coding, and multimodal understanding. These models are optimized for efficient local execution on consumer hardware, bringing advanced capabilities like native function-calling and system prompt support directly to users' machines. This aligns with a broader trend of making powerful AI models available for on-device, private, and offline use.

The Gemma 4 12B model can be run locally via Ollama using simple commands, allowing for direct chat interactions or integration into agentic tools. This enables developers and users to build and experiment with sophisticated AI applications that leverage Google's Gemma 4 12B capabilities without relying on cloud APIs. Ollama's support for agentic coding tools continues to expand these local options.

View the full update on ollama.com

ollama

@ollamaJun 3

.@GoogleDeepMind's Gemma 4 - 12B is available on Ollama! Chat: ollama run gemma4:12b-mlx Hermes Agent: ollama launch hermes --model gemma4:12b-mlx Claude Code: ollama launch claude --model gemma4:12b-mlx and more 👇👇👇 (Note, this currently works via MLX) https://t.co/BWmHT9w33m

1351.3k

View on X

Still wondering? A few quick answers below.

Gemma 4 12B is a multimodal model from Google DeepMind designed for reasoning, agentic workflows, coding, and multimodal understanding. It processes text and image inputs and generates text outputs, with configurable thinking modes for enhanced reasoning.

Gemma 4 models offer advanced reasoning, extended multimodality for text and image, diverse architectures including Mixture-of-Experts, and optimizations for on-device execution. They also feature increased context windows, enhanced coding and agentic capabilities with native function-calling, and native system prompt support.

You can run Gemma 4 12B locally using Ollama. After installing Ollama, you can use the command ollama run gemma4:12b for chat interactions or ollama launch [agent_name] --model gemma4:12b to integrate it with supported agentic applications like Claude Code or Hermes Agent.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Ollama →

Keep reading

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google released Gemma 4, a new family of open models built on the same architecture as Gemini 3 and licensed under Apache 2.0. These models deliver high-performance reasoning and native multimodal capabilities directly on consumer hardware, enabling private, offline agentic workflows. This shift allows developers to build sophisticated AI applications that run entirely on-device without sacrificing intelligence.

Google GemmaMay 29

Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Google released the Google AI Edge Gallery app and LiteRT-LM framework to enable fully offline agentic workflows on mobile and IoT devices. By running Gemma 4 locally, developers can build multi-step agents that plan, use tools, and process multimodal data without cloud latency or privacy risks.

Ollama Launches Qwen 3.6 27B with Native Support for Agentic Coding Tools

OllamaApr 24

Ollama Launches Qwen 3.6 27B with Native Support for Agentic Coding Tools

Ollama added the Qwen 3.6 27B model to its library, enabling local execution of the latest open-weight coding model. The update introduces direct integration with agentic frameworks like OpenClaw and Claude Code, allowing developers to run autonomous coding workflows entirely on local hardware.

Vercel brings Google Gemma 4 to AI Gateway for high-performance agentic workflows

VercelApr 2

Vercel brings Google Gemma 4 to AI Gateway for high-performance agentic workflows

Vercel now supports Google's Gemma 4 models on its AI Gateway, offering native function calling and structured JSON output for building autonomous agents. These 26B and 31B models feature a 256K context window and are built on the same architecture as Gemini 3. This integration allows developers to deploy high-performance open models with enterprise-grade reliability and no price markup.

What is Gemma 4 12B?

What are the key capabilities of Gemma 4 models?

How can I run Gemma 4 12B locally?

Keep reading

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Ollama Launches Qwen 3.6 27B with Native Support for Agentic Coding Tools

Ollama Launches Qwen 3.6 27B with Native Support for Agentic Coding Tools

Vercel brings Google Gemma 4 to AI Gateway for high-performance agentic workflows

Vercel brings Google Gemma 4 to AI Gateway for high-performance agentic workflows

Keep reading

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Ollama Launches Qwen 3.6 27B with Native Support for Agentic Coding Tools

Ollama Launches Qwen 3.6 27B with Native Support for Agentic Coding Tools

Vercel brings Google Gemma 4 to AI Gateway for high-performance agentic workflows

Vercel brings Google Gemma 4 to AI Gateway for high-performance agentic workflows