HeadsUpAI

Google Launches On-Device Agent Skills for Offline Gemma 4 Workflows

Google launched the Google AI Edge Gallery app and LiteRT-LM framework for on-device agentic workflows. This release introduces Agent Skills, allowing Gemma 4 models to execute multi-step plans and process images entirely offline. It builds on the Gemma 4 model family for autonomous execution.
Model size (E2B)
2.58 GB
Memory footprint (E2B)
< 1.5GB
Context window
128K tokens
GPU speedup (MTP)
Up to 2.2x
Hardware support
Android, iOS, Raspberry Pi 5, and more

This shift removes the latency and per-token costs of cloud-based agents. By optimizing E2B and E4B models for local inference, Google enables private automation on smartphones and Raspberry Pi 5. This release transitions Google's offline multimodal reasoning tests from experimental prototype to a production-ready stack.

Access these capabilities through the Google AI Edge Gallery or the litert-lm Python package. A new Multi-Token Prediction feature provides up to a 2.2x speedup on mobile GPUs, adopting the speculative decoding logic found in Gemma 4 drafter models. The models use an Apache 2.0 license.

Google Gemma
Google Gemma
@googlegemma
X

A completely local agent that lives right inside your pocket. 📱 Watch Gemma 4 run 100% locally in the Google AI Edge Gallery app. It converts images into JSON schemas, transcribes audio, and uses agent skills to interact with apps, all entirely offline. https://t.co/bou7Pucbkd

91retweets999likes
View on X

Still wondering? A few quick answers below.

Agent Skills are a new capability for Gemma 4 models that enable multi-step, autonomous workflows to run entirely on-device. These skills allow the AI to perform complex tasks like querying local knowledge bases, generating interactive visualizations, and managing end-to-end application workflows without needing an internet connection or cloud-based processing.

LiteRT-LM is a specialized library that optimizes generative AI for edge devices. It introduces Multi-Token Prediction, which accelerates decode speeds by up to 2.2x on mobile GPUs. It also enables the Gemma 4 E2B model to run with a minimal memory footprint of less than 1.5GB using advanced quantization and memory-mapping techniques.

Yes, Gemma 4 is available for local use through the Google AI Edge Gallery app on both Android and iOS. Developers can also access the model system-wide on Android via the AICore Developer Preview. These tools allow for 100% offline inference, ensuring user data privacy and eliminating cloud-related latency.

Gemma 4 supports an unprecedented range of hardware including Android and iOS mobile devices, and Windows, Linux, and macOS desktops. It is also optimized for IoT and robotics platforms like the Raspberry Pi 5 and the Qualcomm Dragonwing IQ8 processor, which powers the new Arduino VENTUNO Q for edge computing.

The Gemma 4 family of models is released under the Apache 2.0 license. This open-weight license allows developers and researchers to freely use, modify, and distribute the models for both research and commercial applications across mobile, desktop, and IoT platforms without the restrictive terms often found in proprietary AI licenses.

Share this update