Add text, video, or up to five images as your ingredients and Gemini Omni can combine them all into one cohesive ten-second video. Try it today and share your creations in the replies. 👇
Google Gemini Omni Now Synthesizes Text and Images Into Cohesive Video
Google enabled a new video composition feature for Gemini Omni that allows users to mix text, video, and up to five images to create a single ten-second video. This update brings the multimodal capabilities of the Gemini Omni launch directly into the standard Gemini app and web interface for paid subscribers.
- Input limit
- Text, video, or up to 5 images
- Output duration
- 10 seconds
- Availability
- Global
- Subscription tiers
- AI Plus, Pro, and Ultra
- Platforms
- Web and mobile app
This release marks a transition from single-prompt video generation to a composition workflow where the model reasons across multiple disparate assets. By allowing users to provide specific ingredients, Google is addressing the lack of control in traditional models. It follows the Gemini Omni integration into Flow, which introduced similar creative controls.
You can now use these tools to remix existing photos and clips into short-form content without manual editing. The feature is available globally for users on the Google AI Plus, Pro, and Ultra subscription tiers. Access is live through the Gemini mobile app and the web interface at gemini.google.com.
Google Gemini
@GeminiApp
30retweets212likes
View on XStill wondering? A few quick answers below.
Gemini Omni is a multimodal AI model from Google that can process and generate different types of media simultaneously. It creates videos by acting as a synthesizer, taking ingredients like text descriptions, existing video clips, or up to five static images and combining them into a single, cohesive ten-second video output.
The video composition features are available to users with a paid subscription to Google AI Plus, Pro, or Ultra tiers. These subscribers can access the tool globally through the official Gemini website at gemini.google.com or via the Gemini mobile app on supported Android and iOS devices.
You can upload up to five images as reference material for a single video generation request. Gemini Omni uses these images as visual ingredients, reasoning across them to maintain consistency or combine their elements into a ten-second clip based on your text instructions or accompanying video files.
The current version of the Gemini Omni composition tool produces videos that are exactly ten seconds long. While the model can take longer video clips as input ingredients, it distills the provided text, images, and media into a single, short-form cohesive video designed for quick sharing or content remixing.
Gemini Omni functions as a composition and remixing tool that can both generate new scenes and transform existing ones. By providing an existing video as an ingredient along with new text or images, the model can reimagine the content or combine it with other assets into a new ten-second output.



