We’re dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video. It combines Gemini’s intelligence with our generative media systems - representing a leap forward in world understanding, multimodality, and editing 🧵
Google DeepMind Launches Gemini Omni to Reimage and Edit Video Content
Google DeepMind· Updated
Google DeepMind introduced Gemini Omni Flash, a multimodal model that allows users to transform existing video scenes using natural language prompts. By combining generative media systems with Gemini's reasoning, the model can instantly swap environments or add objects while maintaining the original video's action.
- Model name
- Gemini Omni Flash
- Consumer availability
- Gemini App, Flow by Google, YouTube Shorts
- Developer availability
- API (coming weeks)
- Core capability
- Video-to-video editing and reimagining
- Architecture
- Anything-to-anything multimodal generation
This launch shifts focus from pure generation to semantic video editing and world understanding. While previous releases like Veo 3.1 Lite optimized for production efficiency, Gemini Omni integrates these capabilities directly into the core model loop. It follows the Gemini 3.5 Flash general availability update which optimized the model family for autonomous execution.
You can try Gemini Omni Flash today within the Gemini App, YouTube Shorts, and Flow by Google. For developers, Google plans to roll out API access for the Omni family in the coming weeks. This release follows the launch of Gemini Spark personal agents as Google expands its ecosystem of autonomous, multimodal tools.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →


