We’re launching Gemma 4 12B: Our unified, encoder-free model that brings powerful multimodal intelligence straight to your laptop 🚀 The model bridges the gap between our mobile E4B model and larger 26B MoE models, packaging frontier-class reasoning and native audio into a highly optimized footprint, all under a permissive Apache 2.0 license. Here’s what makes it unique: + Encoder-Less Architecture: We removed the multimodal encoders. The vision and audio inputs flow directly into the LLM backbone. + Agentic Performance (16GB VRAM): Run complex, multi-step workflows locally, with performance nearing our 26B model.
Google launches Gemma 4 12B with native audio for laptops
- Model Size
- 12 billion parameters
- Memory Requirement
- 16GB VRAM or unified memory
- License
- Apache 2.0
- Architecture
- Unified encoder-free transformer
- Input Modalities
- Text, Image, Audio
The model fills a gap between mobile-first efficiency and high-capacity reasoning. It delivers performance nearing the larger 26B Mixture of Experts (MoE) (an architecture that activates only a subset of parameters per task) models at less than half the memory footprint. This enables sophisticated agentic workflows to run locally without cloud-based infrastructure.
Gemma 4 12B is available under an Apache 2.0 license and runs on hardware with 16GB of VRAM. It integrates with local tools like Ollama and LM Studio, and includes Multi-Token Prediction drafters to accelerate generation. Weights are accessible via Hugging Face and Kaggle for immediate local deployment.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →




