Google launches Gemma 4 12B with native audio for laptops

Google

Jun 4, 2026 · Updated Jun 12, 2026

Google released Gemma 4 12B, a unified multimodal model that processes audio and vision directly within the LLM backbone. It brings near-frontier reasoning to consumer hardware, enabling complex agentic workflows to run entirely offline on standard laptops.

Google released Gemma 4 12B, a mid-sized multimodal model built on a novel encoder-free architecture. Instead of using separate encoders (specialized modules that translate sensory data), this version flows vision and audio inputs directly into the LLM backbone. This unified approach reduces memory overhead while introducing native audio capabilities to the Gemma 4 family.

Model Size: 12 billion parameters
Memory Requirement: 16GB VRAM or unified memory
License: Apache 2.0
Architecture: Unified encoder-free transformer
Input Modalities: Text, Image, Audio

The model fills a gap between mobile-first efficiency and high-capacity reasoning. It delivers performance nearing the larger 26B Mixture of Experts (MoE) (an architecture that activates only a subset of parameters per task) models at less than half the memory footprint. This enables sophisticated agentic workflows to run locally without cloud-based infrastructure.

Gemma 4 12B is available under an Apache 2.0 license and runs on hardware with 16GB of VRAM. It integrates with local tools like Ollama and LM Studio, and includes Multi-Token Prediction drafters to accelerate generation. Weights are accessible via Hugging Face and Kaggle for immediate local deployment.

View the full update on blog.google

Google AI Developers

@googleaidevsJun 3

We’re launching Gemma 4 12B: Our unified, encoder-free model that brings powerful multimodal intelligence straight to your laptop 🚀 The model bridges the gap between our mobile E4B model and larger 26B MoE models, packaging frontier-class reasoning and native audio into a highly optimized footprint, all under a permissive Apache 2.0 license. Here’s what makes it unique: + Encoder-Less Architecture: We removed the multimodal encoders. The vision and audio inputs flow directly into the LLM backbone. + Agentic Performance (16GB VRAM): Run complex, multi-step workflows locally, with performance nearing our 26B model.

1401.1k

View on X

Still wondering? A few quick answers below.

Gemma 4 12B is a mid-sized, open-weight multimodal model from Google DeepMind. It is designed to bridge the gap between lightweight mobile models and larger frontier systems. The model features a unified architecture that allows it to process text, images, and audio natively within a single transformer backbone.

Yes, Gemma 4 12B is specifically optimized for local execution on consumer hardware. It requires approximately 16GB of VRAM or unified memory to run effectively. This allows developers to build and deploy sophisticated AI agents that operate entirely offline on standard laptops without needing cloud-based GPUs or constant internet connectivity.

Unlike traditional multimodal models that rely on separate encoders to translate sensory data, Gemma 4 12B uses an encoder-free architecture. Vision and audio inputs are projected directly into the same dimensional space as text tokens. This streamlined approach reduces memory usage and latency while enabling more efficient multimodal reasoning.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Google →

Keep reading

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google released Gemma 4, a new family of open models built on the same architecture as Gemini 3 and licensed under Apache 2.0. These models deliver high-performance reasoning and native multimodal capabilities directly on consumer hardware, enabling private, offline agentic workflows. This shift allows developers to build sophisticated AI applications that run entirely on-device without sacrificing intelligence.

Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

OllamaJun 7

Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

Ollama has made Google DeepMind's Gemma 4 12B model available for local execution, including support for chat and agentic applications. This expands access to a powerful, open-weight multimodal model optimized for on-device reasoning and coding, enabling private and offline AI workflows on consumer hardware.

Google GemmaMay 29

Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Google released the Google AI Edge Gallery app and LiteRT-LM framework to enable fully offline agentic workflows on mobile and IoT devices. By running Gemma 4 locally, developers can build multi-step agents that plan, use tools, and process multimodal data without cloud latency or privacy risks.

Arena Ranks Google Gemma 4 as Top Open Vision Model

ArenaMay 8

Arena Ranks Google Gemma 4 as Top Open Vision Model

Google's Gemma-4-31b and Gemma-4-26b-a4b have entered the Vision Arena leaderboard as the #2 and #4 ranked open models. These releases shift the price-performance frontier by delivering vision reasoning capabilities that rival proprietary systems at a fraction of the cost.

What is Gemma 4 12B?

Can Gemma 4 12B run on a laptop?

What makes the Gemma 4 12B architecture unique?

Keep reading

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Arena Ranks Google Gemma 4 as Top Open Vision Model

Arena Ranks Google Gemma 4 as Top Open Vision Model

Keep reading

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Arena Ranks Google Gemma 4 as Top Open Vision Model

Arena Ranks Google Gemma 4 as Top Open Vision Model