Gemini Omni can transform even a basic sketch into a new reality. Try for yourself in the Gemini app. Upload a video of someone drawing a circle and then enter this prompt: When I finish drawing the circle, it becomes ___.
Google Gemini Omni Brings Conversational Video Editing and Sketch to Reality
Google launched Gemini Omni Flash, a multimodal model that generates and edits high-quality video from any combination of text, images, and audio. The system uses an 'anything-to-anything' architecture to synthesize these inputs into cohesive clips, maintaining consistent physics and character details across the scene.
- Availability
- AI Plus, Pro, and Ultra subscribers
- Platforms
- Gemini app, Google Flow, YouTube Shorts, and more
- Input modalities
- Text, Image, Audio, Video
- Watermarking
- SynthID digital watermark
- Developer access
- API (coming weeks)
This rollout shifts AI video from one-shot generation to an iterative, conversational workflow. By grounding generation in Gemini reasoning, the model maintains consistent physics and character details across multiple turns. The rollout expands on the initial Gemini Omni Flash launch by moving the technology from a research preview into mass-market platforms like YouTube Shorts.
You can now access Gemini Omni Flash through the Gemini app and Google Flow with a Google AI Plus, Pro, or Ultra subscription. The model is also rolling out at no cost to YouTube Shorts users this week, supporting multi-turn editing where each prompt builds on the last.
Google Gemini
@GeminiApp
24retweets218likes
View on XStill wondering? A few quick answers below.
Gemini Omni is a new family of multimodal models from Google designed to create and edit content across text, images, audio, and video. The first model, Gemini Omni Flash, focuses on high-speed video generation and conversational editing, allowing users to transform existing footage or sketches into new realities using natural language instructions.
Conversational editing allows you to refine videos through a multi-turn dialogue where each instruction builds on the previous one. The model uses Gemini reasoning to maintain character consistency and physical laws like gravity and fluid dynamics. You can change specific objects, transform environments, or adjust camera angles while the model remembers the original scene context.
Gemini Omni Flash is currently available to Google AI Plus, Pro, and Ultra subscribers through the Gemini app and Google Flow. It is also rolling out at no cost to users on YouTube Shorts and the YouTube Create app. Developers and enterprise customers will gain access to the model via APIs in the coming weeks.
Yes, Gemini Omni features a sketch-to-video capability that transforms basic drawings or videos of someone sketching into realistic footage. By providing a video of a drawing and a prompt describing the desired outcome, the model uses the sketch as a guide for movement and structure to generate a fully rendered video sequence.
All videos created with Gemini Omni include SynthID, an imperceptible digital watermark that allows users to verify if content was AI-generated. Google also restricts certain features, such as the ability to edit speech or audio in existing videos, while it continues to test these capabilities for responsible deployment and protection against potential harm.





