Google Gemini Omni Brings Conversational Video Editing and Sketch to Reality

Gemini

May 29, 2026 · Updated Jun 12, 2026

Google rolled out Gemini Omni Flash to AI subscribers and YouTube Shorts, enabling users to transform sketches and existing footage through natural language dialogue. The model uses multimodal reasoning to maintain physical consistency and character memory across multiple rounds of video edits.

Google launched Gemini Omni Flash, a multimodal model that generates and edits high-quality video from any combination of text, images, and audio. The system uses an 'anything-to-anything' architecture to synthesize these inputs into cohesive clips, maintaining consistent physics and character details across the scene.

Availability: AI Plus, Pro, and Ultra subscribers
Platforms: Gemini app, Google Flow, YouTube Shorts, and more
Input modalities: Text, Image, Audio, Video
Watermarking: SynthID digital watermark
Developer access: API (coming weeks)

This rollout shifts AI video from one-shot generation to an iterative, conversational workflow. By grounding generation in Gemini reasoning, the model maintains consistent physics and character details across multiple turns. The rollout expands on the initial Gemini Omni Flash launch by moving the technology from a research preview into mass-market platforms like YouTube Shorts.

You can now access Gemini Omni Flash through the Gemini app and Google Flow with a Google AI Plus, Pro, or Ultra subscription. The model is also rolling out at no cost to YouTube Shorts users this week, supporting multi-turn editing where each prompt builds on the last.

View the full update on blog.google

Google Gemini

@GeminiAppMay 29

Gemini Omni can transform even a basic sketch into a new reality. Try for yourself in the Gemini app. Upload a video of someone drawing a circle and then enter this prompt: When I finish drawing the circle, it becomes ___.

24218

View on X

Still wondering? A few quick answers below.

Gemini Omni is a new family of multimodal models from Google designed to create and edit content across text, images, audio, and video. The first model, Gemini Omni Flash, focuses on high-speed video generation and conversational editing, allowing users to transform existing footage or sketches into new realities using natural language instructions.

Conversational editing allows you to refine videos through a multi-turn dialogue where each instruction builds on the previous one. The model uses Gemini reasoning to maintain character consistency and physical laws like gravity and fluid dynamics. You can change specific objects, transform environments, or adjust camera angles while the model remembers the original scene context.

Gemini Omni Flash is currently available to Google AI Plus, Pro, and Ultra subscribers through the Gemini app and Google Flow. It is also rolling out at no cost to users on YouTube Shorts and the YouTube Create app. Developers and enterprise customers will gain access to the model via APIs in the coming weeks.

Yes, Gemini Omni features a sketch-to-video capability that transforms basic drawings or videos of someone sketching into realistic footage. By providing a video of a drawing and a prompt describing the desired outcome, the model uses the sketch as a guide for movement and structure to generate a fully rendered video sequence.

All videos created with Gemini Omni include SynthID, an imperceptible digital watermark that allows users to verify if content was AI-generated. Google also restricts certain features, such as the ability to edit speech or audio in existing videos, while it continues to test these capabilities for responsible deployment and protection against potential harm.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Google →

Keep reading

Google DeepMind Launches Gemini Omni to Reimage and Edit Video Content

Google DeepMind introduced Gemini Omni Flash, a multimodal model that allows users to transform existing video scenes using natural language prompts. By combining generative media systems with Gemini's reasoning, the model can instantly swap environments or add objects while maintaining the original video's action.

Google Gemini Omni Now Synthesizes Text and Images Into Cohesive Video

GeminiMay 29

Google Gemini Omni Now Synthesizes Text and Images Into Cohesive Video

Google rolled out a new video composition feature for Gemini Omni that turns text, video, and up to five images into a single ten-second clip. This shift moves AI video from simple generation to active asset remixing directly within a general-purpose assistant.

Google Launches Gemini 3.5 Powered Search with Multimodal Agentic Reasoning

GoogleMay 20

Google Launches Gemini 3.5 Powered Search with Multimodal Agentic Reasoning

Google launched a unified AI Search experience that merges AI Overviews and AI Mode into a single conversational interface powered by Gemini 3.5. The update enables users to query across text, images, files, and video while maintaining persistent context for follow-ups.

What is Gemini Omni?

How does conversational video editing work in Gemini Omni?

Who can access Gemini Omni Flash right now?

Can Gemini Omni turn a drawing into a video?

How does Google handle safety and transparency for Gemini Omni videos?

Keep reading

Google DeepMind Launches Gemini Omni to Reimage and Edit Video Content

Google DeepMind Launches Gemini Omni to Reimage and Edit Video Content

Google Gemini Omni Now Synthesizes Text and Images Into Cohesive Video

Google Gemini Omni Now Synthesizes Text and Images Into Cohesive Video

Google Launches Gemini 3.5 Powered Search with Multimodal Agentic Reasoning

Google Launches Gemini 3.5 Powered Search with Multimodal Agentic Reasoning

Keep reading

Google DeepMind Launches Gemini Omni to Reimage and Edit Video Content

Google DeepMind Launches Gemini Omni to Reimage and Edit Video Content

Google Gemini Omni Now Synthesizes Text and Images Into Cohesive Video

Google Gemini Omni Now Synthesizes Text and Images Into Cohesive Video

Google Launches Gemini 3.5 Powered Search with Multimodal Agentic Reasoning

Google Launches Gemini 3.5 Powered Search with Multimodal Agentic Reasoning