Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Q: What are Google Gemma 4 Agent Skills?

Agent Skills are a new capability for Gemma 4 models that enable multi-step, autonomous workflows to run entirely on-device. These skills allow the AI to perform complex tasks like querying local knowledge bases, generating interactive visualizations, and managing end-to-end application workflows without needing an internet connection or cloud-based processing.

Q: How does LiteRT-LM improve Gemma 4 performance?

LiteRT-LM is a specialized library that optimizes generative AI for edge devices. It introduces Multi-Token Prediction, which accelerates decode speeds by up to 2.2x on mobile GPUs. It also enables the Gemma 4 E2B model to run with a minimal memory footprint of less than 1.5GB using advanced quantization and memory-mapping techniques.

Q: Is Gemma 4 available for local use on mobile devices?

Yes, Gemma 4 is available for local use through the Google AI Edge Gallery app on both Android and iOS. Developers can also access the model system-wide on Android via the AICore Developer Preview. These tools allow for 100% offline inference, ensuring user data privacy and eliminating cloud-related latency.

Q: What hardware platforms support Gemma 4 and LiteRT-LM?

Gemma 4 supports an unprecedented range of hardware including Android and iOS mobile devices, and Windows, Linux, and macOS desktops. It is also optimized for IoT and robotics platforms like the Raspberry Pi 5 and the Qualcomm Dragonwing IQ8 processor, which powers the new Arduino VENTUNO Q for edge computing.

Q: What is the licensing for Google Gemma 4?

The Gemma 4 family of models is released under the Apache 2.0 license. This open-weight license allows developers and researchers to freely use, modify, and distribute the models for both research and commercial applications across mobile, desktop, and IoT platforms without the restrictive terms often found in proprietary AI licenses.

Google Gemma

May 29, 2026 · Updated Jun 13, 2026

Google released the Google AI Edge Gallery app and LiteRT-LM framework to enable fully offline agentic workflows on mobile and IoT devices. By running Gemma 4 locally, developers can build multi-step agents that plan, use tools, and process multimodal data without cloud latency or privacy risks.

Google launched the Google AI Edge Gallery app and LiteRT-LM framework for on-device agentic workflows. This release introduces Agent Skills, allowing Gemma 4 models to execute multi-step plans and process images entirely offline. It builds on the Gemma 4 model family for autonomous execution.

Model size (E2B): 2.58 GB
Memory footprint (E2B): < 1.5GB
Context window: 128K tokens
GPU speedup (MTP): Up to 2.2x
Hardware support: Android, iOS, Raspberry Pi 5, and more

This shift removes the latency and per-token costs of cloud-based agents. By optimizing E2B and E4B models for local inference, Google enables private automation on smartphones and Raspberry Pi 5. This release transitions Google's offline multimodal reasoning tests from experimental prototype to a production-ready stack.

Access these capabilities through the Google AI Edge Gallery or the litert-lm Python package. A new Multi-Token Prediction feature provides up to a 2.2x speedup on mobile GPUs, adopting the speculative decoding logic found in Gemma 4 drafter models. The models use an Apache 2.0 license.

View the full update on developers.googleblog.com

Google Gemma

@googlegemmaMay 29

A completely local agent that lives right inside your pocket. 📱 Watch Gemma 4 run 100% locally in the Google AI Edge Gallery app. It converts images into JSON schemas, transcribes audio, and uses agent skills to interact with apps, all entirely offline. https://t.co/bou7Pucbkd

1351.4k

View on X

Still wondering? A few quick answers below.

Agent Skills are a new capability for Gemma 4 models that enable multi-step, autonomous workflows to run entirely on-device. These skills allow the AI to perform complex tasks like querying local knowledge bases, generating interactive visualizations, and managing end-to-end application workflows without needing an internet connection or cloud-based processing.

LiteRT-LM is a specialized library that optimizes generative AI for edge devices. It introduces Multi-Token Prediction, which accelerates decode speeds by up to 2.2x on mobile GPUs. It also enables the Gemma 4 E2B model to run with a minimal memory footprint of less than 1.5GB using advanced quantization and memory-mapping techniques.

Yes, Gemma 4 is available for local use through the Google AI Edge Gallery app on both Android and iOS. Developers can also access the model system-wide on Android via the AICore Developer Preview. These tools allow for 100% offline inference, ensuring user data privacy and eliminating cloud-related latency.

Gemma 4 supports an unprecedented range of hardware including Android and iOS mobile devices, and Windows, Linux, and macOS desktops. It is also optimized for IoT and robotics platforms like the Raspberry Pi 5 and the Qualcomm Dragonwing IQ8 processor, which powers the new Arduino VENTUNO Q for edge computing.

The Gemma 4 family of models is released under the Apache 2.0 license. This open-weight license allows developers and researchers to freely use, modify, and distribute the models for both research and commercial applications across mobile, desktop, and IoT platforms without the restrictive terms often found in proprietary AI licenses.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Google →

Keep reading

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google released Gemma 4, a new family of open models built on the same architecture as Gemini 3 and licensed under Apache 2.0. These models deliver high-performance reasoning and native multimodal capabilities directly on consumer hardware, enabling private, offline agentic workflows. This shift allows developers to build sophisticated AI applications that run entirely on-device without sacrificing intelligence.

Google GemmaMay 29

Google Tests Offline Gemma 4 App for Multimodal Reasoning on Pixel Hardware

Google demonstrated an experimental Gemma 4 application running entirely offline on a Pixel phone and prototype display glasses. The field test proves that complex multimodal tasks like visual understanding and tool use can function without any cloud connectivity or data service.

Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

OllamaJun 7

Ollama Adds Google DeepMind's Gemma 4 12B for Local Agentic AI

Ollama has made Google DeepMind's Gemma 4 12B model available for local execution, including support for chat and agentic applications. This expands access to a powerful, open-weight multimodal model optimized for on-device reasoning and coding, enabling private and offline AI workflows on consumer hardware.

Vercel brings Google Gemma 4 to AI Gateway for high-performance agentic workflows

VercelApr 2

Vercel brings Google Gemma 4 to AI Gateway for high-performance agentic workflows

Vercel now supports Google's Gemma 4 models on its AI Gateway, offering native function calling and structured JSON output for building autonomous agents. These 26B and 31B models feature a 256K context window and are built on the same architecture as Gemini 3. This integration allows developers to deploy high-performance open models with enterprise-grade reliability and no price markup.