Start building with Gemini Embedding 2, our most capable and first fully multimodal embedding model built on the Gemini architecture. Now available in preview via the Gemini API and in Vertex AI. https://t.co/jPE8KpN7Rf
Google Launches Gemini Embedding 2, Its First Multimodal Embedding Model
Google· Updated
Gemini Embedding 2 maps text, images, video, audio, and documents into a single embedding space — Google's first multimodal embedding model, now in public preview. One API call handles interleaved multimodal inputs, eliminating separate per-modality pipelines.
Most embedding pipelines require separate models per modality then fusion logic to compare across them. gemini-embedding-2-preview ingests interleaved inputs natively, collapsing that into one step and simplifying multimodal RAG, semantic search, and clustering. Dimensions flex from 128 to 3072 via Matryoshka Representation Learning — a nesting technique that lets developers tune performance vs. storage.
Use the Gemini API or Vertex AI to build multimodal search — query a video archive by text prompt, or use an image to retrieve matching content. Google reports outperformance over leading models on text, image, and video benchmarks.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →


