Google Launches Gemini Embedding 2 for Production Multimodal Search Applications

This update simplifies Retrieval-Augmented Generation (RAG) systems (grounding AI responses with external data) that handle diverse media. By mapping modalities into one embedding space, you can perform cross-modal searches without separate, aligned pipelines. The model also uses Matryoshka Representation Learning to allow flexible vector sizes without quality loss.
Integrate these embeddings via the Gemini API or Vertex AI. The embedding space is incompatible with previous versions, requiring a full re-embedding of data. For high-volume tasks, the Batch API offers a 50% discount. You must now use specific task prefixes in prompts instead of the legacy task_type parameter.
Frequently asked questions
- What is Gemini Embedding 2?
- Gemini Embedding 2 is Google's first natively multimodal embedding model. It transforms text, images, audio, video, and PDF documents into a single numerical vector space, which is a mathematical representation of data meaning. This allows for cross-modal search and clustering, enabling systems to understand semantic relationships between different types of media using one model.
- How do I migrate from Gemini Embedding 001 to Gemini Embedding 2?
- Migration requires a full re-embedding of your existing data because the vector spaces between the two models are incompatible. You must also update your code to use task-specific prompt instructions, which are text prefixes that tell the model how to optimize the vector for tasks like search or classification, replacing the legacy task type parameter.
- What are the input limits for Gemini Embedding 2?
- The model supports a maximum input of 8,192 tokens per request. Specific modality limits include up to 6 images, 180 seconds of audio, and 120 seconds of video. For documents, you can process up to 6 PDF pages. If you need to embed longer videos, you should chunk them into overlapping segments for individual processing.
- Can I change the size of the embeddings in Gemini Embedding 2?
- Gemini Embedding 2 allows you to control the output dimensionality between 128 and 3072. It uses Matryoshka Representation Learning, a technique that teaches models to create high-dimensional vectors where the initial segments are also useful. The model automatically normalizes these truncated dimensions, so you can save storage space while maintaining accurate semantic similarity results.
- Is there a cheaper way to generate embeddings with Gemini Embedding 2?
- Developers can use the Batch API, a service for processing large groups of requests at once, for high-throughput tasks where real-time latency is not required. Using the Batch API reduces the cost of generating embeddings by 50 percent. This is particularly useful for the initial re-embedding of large datasets required when migrating between incompatible models.

