Google Launches Gemini Embedding 2, First Natively Multimodal Embedding Model

Google

Mar 18, 2026 · Updated Apr 25, 2026

Gemini Embedding 2, now in preview via the Gemini API, is Google's first natively multimodal embedding model — enabling semantic understanding across text, images, videos, audio, and documents in a unified representation space.

Gemini Embedding 2 (gemini-embedding-2-preview), now in preview, is Google’s first natively multimodal embedding model — accepting text, images, videos, audio, and documents and producing embeddings that capture semantic meaning across all input types in a single model.

Most production retrieval and search systems handle modalities separately — text embeddings in one system, image embeddings in another — requiring stitched-together pipelines for cross-modal queries. Gemini Embedding 2 collapses that into a unified representation space, making cross-modal semantic search achievable in a single model.

Access it now through the Gemini API in preview — try it against a mixed-content corpus to test cross-modal retrieval across your existing content types.

View the full update on ai.google.dev

Google AI

@GoogleAIMar 13

Here’s everything that happened this week 🚀: — @GoogleMaps released 2 new features, Ask Maps to handle your most complex questions about places and trips and Immersive Navigation for intuitive routes, all with some help from the latest Gemini models — New Gemini features rolled out to @GoogleWorkspace, making @GoogleDocs, Sheets, Slides, and @GoogleDrive more helpful — In collaboration with Imperial College London and the UK’s NHS, we published breast cancer research that demonstrates AI’s potential to detect 25% of interval cancers previously missed by conventional methods — We introduced Gemini Embedding 2 (in preview), our first natively multimodal embedding model, which enables semantic understanding across text, images, videos, audio, and documents inputs all in a single model — We also launched project spend caps for the Gemini API in @GoogleAIStudio, enabling you to set a dollar amount for maximum spend at https://t.co/NApz8LVHll — Gemini in @GoogleChrome began rolling out on desktop to signed-in users (18+) in India, New Zealand, and Canada, with expansions to mobile and more regions and languages coming throughout the year

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Google →

Keep reading

Google Launches Gemini Embedding 2, Its First Multimodal Embedding Model

Gemini Embedding 2 maps text, images, video, audio, and documents into a single embedding space — Google's first multimodal embedding model, now in public preview. One API call handles interleaved multimodal inputs, eliminating separate per-modality pipelines.

Google Brings Gemini 3.5 Flash to Everyone for Free Visual Research

GeminiMay 21

Google Brings Gemini 3.5 Flash to Everyone for Free Visual Research

Google is rolling out Gemini 3.5 Flash globally to all users for free via the web and mobile app. The update shifts the high-speed model from a developer tool to a consumer assistant capable of analyzing complex diagrams and math papers. This move democratizes frontier-level multimodal reasoning for everyday research and document exploration.

Google DeepMindMay 20

Google DeepMind Launches Gemini Omni to Reimage and Edit Video Content

Google DeepMind introduced Gemini Omni Flash, a multimodal model that allows users to transform existing video scenes using natural language prompts. By combining generative media systems with Gemini's reasoning, the model can instantly swap environments or add objects while maintaining the original video's action.

Google AI StudioMar 18

Gemini API Adds Per-Project Monthly Spend Caps in AI Studio

Google launched Project Spend Caps for the Gemini API, letting developers set a monthly dollar limit per project in AI Studio. Caps have up to a 10-minute activation delay, and Google revamped usage tiers to auto-upgrade developers as usage scales.