Google Gemini Omni Now Synthesizes Text and Images Into Cohesive Video

Gemini

May 27, 2026 · Updated Jun 12, 2026

Google rolled out a new video composition feature for Gemini Omni that turns text, video, and up to five images into a single ten-second clip. This shift moves AI video from simple generation to active asset remixing directly within a general-purpose assistant.

Google enabled a new video composition feature for Gemini Omni that allows users to mix text, video, and up to five images to create a single ten-second video. This update brings the multimodal capabilities of the Gemini Omni launch directly into the standard Gemini app and web interface for paid subscribers.

Input limit: Text, video, or up to 5 images
Output duration: 10 seconds
Availability: Global
Subscription tiers: AI Plus, Pro, and Ultra
Platforms: Web and mobile app

This release marks a transition from single-prompt video generation to a composition workflow where the model reasons across multiple disparate assets. By allowing users to provide specific ingredients, Google is addressing the lack of control in traditional models. It follows the Gemini Omni integration into Flow, which introduced similar creative controls.

You can now use these tools to remix existing photos and clips into short-form content without manual editing. The feature is available globally for users on the Google AI Plus, Pro, and Ultra subscription tiers. Access is live through the Gemini mobile app and the web interface at gemini.google.com.

View the full update on gemini.google.com

Google Gemini

@GeminiAppMay 27

Add text, video, or up to five images as your ingredients and Gemini Omni can combine them all into one cohesive ten-second video. Try it today and share your creations in the replies. 👇

61524

View on X

Still wondering? A few quick answers below.

Gemini Omni is a multimodal AI model from Google that can process and generate different types of media simultaneously. It creates videos by acting as a synthesizer, taking ingredients like text descriptions, existing video clips, or up to five static images and combining them into a single, cohesive ten-second video output.

The video composition features are available to users with a paid subscription to Google AI Plus, Pro, or Ultra tiers. These subscribers can access the tool globally through the official Gemini website at gemini.google.com or via the Gemini mobile app on supported Android and iOS devices.

You can upload up to five images as reference material for a single video generation request. Gemini Omni uses these images as visual ingredients, reasoning across them to maintain consistency or combine their elements into a ten-second clip based on your text instructions or accompanying video files.

The current version of the Gemini Omni composition tool produces videos that are exactly ten seconds long. While the model can take longer video clips as input ingredients, it distills the provided text, images, and media into a single, short-form cohesive video designed for quick sharing or content remixing.

Gemini Omni functions as a composition and remixing tool that can both generate new scenes and transform existing ones. By providing an existing video as an ingredient along with new text or images, the model can reimagine the content or combine it with other assets into a new ten-second output.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Google →

Keep reading

Google DeepMind Launches Gemini Omni to Reimage and Edit Video Content

Google DeepMind introduced Gemini Omni Flash, a multimodal model that allows users to transform existing video scenes using natural language prompts. By combining generative media systems with Gemini's reasoning, the model can instantly swap environments or add objects while maintaining the original video's action.

Google Gemini Omni Brings Conversational Video Editing and Sketch to Reality

GeminiMay 29

Google Gemini Omni Brings Conversational Video Editing and Sketch to Reality

Google rolled out Gemini Omni Flash to AI subscribers and YouTube Shorts, enabling users to transform sketches and existing footage through natural language dialogue. The model uses multimodal reasoning to maintain physical consistency and character memory across multiple rounds of video edits.

Google Launches Gemini Embedding 2, First Natively Multimodal Embedding Model

GoogleMar 18

Google Launches Gemini Embedding 2, First Natively Multimodal Embedding Model

Gemini Embedding 2, now in preview via the Gemini API, is Google's first natively multimodal embedding model — enabling semantic understanding across text, images, videos, audio, and documents in a unified representation space.

Google AI StudioApr 15

Google AI Studio Now Generates Automated App Designs Using Gemini

Gemini now generates visual designs for applications during the build process in Google AI Studio. This update allows users to instantly apply polished UI themes to functional prototypes, bridging the gap between raw code and demo-ready software.

What is Gemini Omni and how does it create videos?

Who can access the new Gemini Omni video features?

How many images can I use as input for Gemini Omni videos?

What is the maximum length of videos generated by Gemini Omni?

Can Gemini Omni edit existing videos or only create new ones?

Keep reading

Google DeepMind Launches Gemini Omni to Reimage and Edit Video Content

Google DeepMind Launches Gemini Omni to Reimage and Edit Video Content

Google Gemini Omni Brings Conversational Video Editing and Sketch to Reality

Google Gemini Omni Brings Conversational Video Editing and Sketch to Reality

Google Launches Gemini Embedding 2, First Natively Multimodal Embedding Model

Google Launches Gemini Embedding 2, First Natively Multimodal Embedding Model

Google AI Studio Now Generates Automated App Designs Using Gemini

Google AI Studio Now Generates Automated App Designs Using Gemini

Keep reading

Google DeepMind Launches Gemini Omni to Reimage and Edit Video Content

Google DeepMind Launches Gemini Omni to Reimage and Edit Video Content

Google Gemini Omni Brings Conversational Video Editing and Sketch to Reality

Google Gemini Omni Brings Conversational Video Editing and Sketch to Reality

Google Launches Gemini Embedding 2, First Natively Multimodal Embedding Model

Google Launches Gemini Embedding 2, First Natively Multimodal Embedding Model

Google AI Studio Now Generates Automated App Designs Using Gemini

Google AI Studio Now Generates Automated App Designs Using Gemini