Google DeepMind Launches Gemini Omni to Reimage and Edit Video Content

Google DeepMind

May 19, 2026 · Updated Jun 13, 2026

Google DeepMind introduced Gemini Omni Flash, a multimodal model that allows users to transform existing video scenes using natural language prompts. By combining generative media systems with Gemini's reasoning, the model can instantly swap environments or add objects while maintaining the original video's action.

Google DeepMind released Gemini Omni Flash, the first model in a new family designed to unify reasoning with generative media. Unlike standard text-to-video tools, this architecture enables anything-to-anything generation. Users provide a video and describe changes—like altering the background or inserting new elements—that the model executes instantly.

Model name: Gemini Omni Flash
Consumer availability: Gemini App, Flow by Google, YouTube Shorts
Developer availability: API (coming weeks)
Core capability: Video-to-video editing and reimagining
Architecture: Anything-to-anything multimodal generation

This launch shifts focus from pure generation to semantic video editing and world understanding. While previous releases like Veo 3.1 Lite optimized for production efficiency, Gemini Omni integrates these capabilities directly into the core model loop. It follows the Gemini 3.5 Flash general availability update which optimized the model family for autonomous execution.

You can try Gemini Omni Flash today within the Gemini App, YouTube Shorts, and Flow by Google. For developers, Google plans to roll out API access for the Omni family in the coming weeks. This release follows the launch of Gemini Spark personal agents as Google expands its ecosystem of autonomous, multimodal tools.

Google DeepMind

@GoogleDeepMindMay 19

We’re dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video. It combines Gemini’s intelligence with our generative media systems - representing a leap forward in world understanding, multimodality, and editing 🧵

1.3k8.4k

View on X

Still wondering? A few quick answers below.

Gemini Omni is a new model family from Google DeepMind designed to unify reasoning with generative media systems. It represents a shift toward anything-to-anything generation, where the model can natively understand and create content across different formats. The first release in this family, Gemini Omni Flash, focuses specifically on advanced video generation and editing.

Gemini Omni Flash allows users to reimagine existing video content through natural language instructions. By combining generative media systems with world understanding, the model can instantly transform a scene by changing the environment, adding new objects, or creating unexpected visual elements while preserving the original action and movement captured in the user's video.

Gemini Omni Flash is currently available for users to try within the Gemini App, YouTube Shorts, and a new creative platform called Flow by Google. These integrations allow creators to use the model's video transformation capabilities directly within Google's existing ecosystem of consumer applications and creative tools.

Google DeepMind plans to roll out API access for the Gemini Omni model family in the coming weeks. This upcoming release will allow developers to integrate these multimodal capabilities into their own applications, enabling new workflows for automated video editing, scene transformation, and generative media production at scale.

While previous Gemini models focused on multimodal understanding, Gemini Omni is built to create and edit content across modalities in a single loop. It integrates generative media systems directly into the model's intelligence, moving beyond simple text-to-video generation toward a system that can semantically manipulate existing video files based on user prompts.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Google →

Keep reading

Google Gemini Omni Brings Conversational Video Editing and Sketch to Reality

Google rolled out Gemini Omni Flash to AI subscribers and YouTube Shorts, enabling users to transform sketches and existing footage through natural language dialogue. The model uses multimodal reasoning to maintain physical consistency and character memory across multiple rounds of video edits.

Google DeepMindMay 20

Google Flow Adds Agentic Editing and Character Consistency via Gemini Omni

Google updated its Flow creative studios with Gemini Omni Flash to enable precise video editing and stable character identities across scenes. By introducing an autonomous agent for batch editing and natural language tool creation, Google is shifting AI video from single-clip generation to a managed production workflow.

Google Launches Gemini Embedding 2, First Natively Multimodal Embedding Model

GoogleMar 18

Google Launches Gemini Embedding 2, First Natively Multimodal Embedding Model

Gemini Embedding 2, now in preview via the Gemini API, is Google's first natively multimodal embedding model — enabling semantic understanding across text, images, videos, audio, and documents in a unified representation space.

What is Gemini Omni?

How does video editing work in Gemini Omni Flash?

Where can I use Gemini Omni Flash right now?

When will the Gemini Omni API be available for developers?

What is the difference between Gemini Omni and previous Gemini models?

Keep reading

Google Gemini Omni Brings Conversational Video Editing and Sketch to Reality

Google Gemini Omni Brings Conversational Video Editing and Sketch to Reality

Google Flow Adds Agentic Editing and Character Consistency via Gemini Omni

Google Flow Adds Agentic Editing and Character Consistency via Gemini Omni

Google Launches Gemini Embedding 2, First Natively Multimodal Embedding Model

Google Launches Gemini Embedding 2, First Natively Multimodal Embedding Model

Keep reading

Google Gemini Omni Brings Conversational Video Editing and Sketch to Reality

Google Gemini Omni Brings Conversational Video Editing and Sketch to Reality

Google Flow Adds Agentic Editing and Character Consistency via Gemini Omni

Google Flow Adds Agentic Editing and Character Consistency via Gemini Omni

Google Launches Gemini Embedding 2, First Natively Multimodal Embedding Model

Google Launches Gemini Embedding 2, First Natively Multimodal Embedding Model