We’re expanding the Gemini API File Search tool 🔍 with 3 new updates that enable developers to more easily build multimodal RAG systems with enhanced precision: + Multimodal Support: By leveraging our Gemini Embedding 2 model, File Search can now reason across image and text simultaneously. + Custom Metadata Filtering: Bring structure to unstructured data by tagging files with custom key-value labels. This pre-filters your data and boosts search speed. + Exact citations: File Search can now capture and return the exact source (down to the page number) for every piece of information indexed. See multimodal File Search in action with our example app in @GoogleAIStudio. Chat with your entire image and doc library, ask questions, and trace answers back to the source: https://t.co/WJFiPPpyRF
Google Adds Multimodal RAG and Page Citations to Gemini File Search
· Updated
- Embedding model
- Gemini Embedding 2
- Supported modalities
- Text and images
- Citation granularity
- Page-level
- Filtering method
- Custom metadata key-value labels
- Availability
- Gemini API and Google AI Studio
Building production-grade RAG for visual data traditionally requires complex custom pipelines. By moving multimodal reasoning and custom metadata filtering into a managed tool, Google is simplifying the creation of photographic memory for agents. This allows for precise, structured queries that reduce noise and improve retrieval speed at scale.
You can now attach key-value labels to files to pre-filter data and use page-level citations to verify answers. These features are available via the Gemini API and Google AI Studio, matching Google AI Studio's visual Edit Mode. Developers can initialize a multimodal store using the genai client to start indexing mixed-modality libraries today.
Still wondering? A few quick answers below.


