We’re dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video. It combines Gemini’s intelligence with our generative media systems - representing a leap forward in world understanding, multimodality, and editing 🧵
Google DeepMind Launches Gemini Omni to Reimage and Edit Video Content
Google DeepMind released Gemini Omni Flash, the first model in a new family designed to unify reasoning with generative media. Unlike standard text-to-video tools, this architecture enables anything-to-anything generation. Users provide a video and describe changes—like altering the background or inserting new elements—that the model executes instantly.
- Model name
- Gemini Omni Flash
- Consumer availability
- Gemini App, Flow by Google, YouTube Shorts
- Developer availability
- API (coming weeks)
- Core capability
- Video-to-video editing and reimagining
- Architecture
- Anything-to-anything multimodal generation
This launch shifts focus from pure generation to semantic video editing and world understanding. While previous releases like Veo 3.1 Lite optimized for production efficiency, Gemini Omni integrates these capabilities directly into the core model loop. It follows the Gemini 3.5 Flash general availability update which optimized the model family for autonomous execution.
You can try Gemini Omni Flash today within the Gemini App, YouTube Shorts, and Flow by Google. For developers, Google plans to roll out API access for the Omni family in the coming weeks. This release follows the launch of Gemini Spark personal agents as Google expands its ecosystem of autonomous, multimodal tools.
Google DeepMind
@GoogleDeepMind
1.2kretweets8klikes
View on XStill wondering? A few quick answers below.
Gemini Omni is a new model family from Google DeepMind designed to unify reasoning with generative media systems. It represents a shift toward anything-to-anything generation, where the model can natively understand and create content across different formats. The first release in this family, Gemini Omni Flash, focuses specifically on advanced video generation and editing.
Gemini Omni Flash allows users to reimagine existing video content through natural language instructions. By combining generative media systems with world understanding, the model can instantly transform a scene by changing the environment, adding new objects, or creating unexpected visual elements while preserving the original action and movement captured in the user's video.
Gemini Omni Flash is currently available for users to try within the Gemini App, YouTube Shorts, and a new creative platform called Flow by Google. These integrations allow creators to use the model's video transformation capabilities directly within Google's existing ecosystem of consumer applications and creative tools.
Google DeepMind plans to roll out API access for the Gemini Omni model family in the coming weeks. This upcoming release will allow developers to integrate these multimodal capabilities into their own applications, enabling new workflows for automated video editing, scene transformation, and generative media production at scale.
While previous Gemini models focused on multimodal understanding, Gemini Omni is built to create and edit content across modalities in a single loop. It integrates generative media systems directly into the model's intelligence, moving beyond simple text-to-video generation toward a system that can semantically manipulate existing video files based on user prompts.



