HeadsUpAI

Google Adds Multimodal RAG and Page Citations to Gemini File Search

· Updated

Google expanded its Gemini API File Search tool, a managed retrieval-augmented generation (RAG) (grounding AI in external data) infrastructure. The update introduces multimodal support that builds on the Gemini Embedding 2 model, allowing the system to index and search across images and text simultaneously without manual preprocessing.
Embedding model
Gemini Embedding 2
Supported modalities
Text and images
Citation granularity
Page-level
Filtering method
Custom metadata key-value labels
Availability
Gemini API and Google AI Studio

Building production-grade RAG for visual data traditionally requires complex custom pipelines. By moving multimodal reasoning and custom metadata filtering into a managed tool, Google is simplifying the creation of photographic memory for agents. This allows for precise, structured queries that reduce noise and improve retrieval speed at scale.

You can now attach key-value labels to files to pre-filter data and use page-level citations to verify answers. These features are available via the Gemini API and Google AI Studio, matching Google AI Studio's visual Edit Mode. Developers can initialize a multimodal store using the genai client to start indexing mixed-modality libraries today.

Google AI Developers
Google AI Developers
@googleaidevs
X

We’re expanding the Gemini API File Search tool 🔍 with 3 new updates that enable developers to more easily build multimodal RAG systems with enhanced precision: + Multimodal Support: By leveraging our Gemini Embedding 2 model, File Search can now reason across image and text simultaneously. + Custom Metadata Filtering: Bring structure to unstructured data by tagging files with custom key-value labels. This pre-filters your data and boosts search speed. + Exact citations: File Search can now capture and return the exact source (down to the page number) for every piece of information indexed. See multimodal File Search in action with our example app in @GoogleAIStudio. Chat with your entire image and doc library, ask questions, and trace answers back to the source: https://t.co/WJFiPPpyRF

75retweets621likes
View on X

Still wondering? A few quick answers below.

Gemini API File Search is a managed infrastructure tool that automates the retrieval-augmented generation process. It handles the ingestion, chunking, and embedding of documents so developers can ground AI responses in their own data. The tool eliminates the need for developers to build and maintain their own custom vector databases or complex retrieval pipelines from scratch.

The tool now uses the Gemini Embedding 2 model to process and index images and text within the same vector space. This allows AI agents to perform semantic searches across mixed-modality libraries, such as finding specific diagrams or photos based on natural language descriptions, without requiring developers to perform any manual image preprocessing or separate indexing.

Custom metadata filtering allows developers to attach structured key-value labels, such as department names or document status, to unstructured files. By applying these filters during a query, the system can scope its search to specific data slices. This pre-filtering reduces noise from irrelevant documents, which significantly increases both the speed and accuracy of the retrieval process.

File Search now captures and returns the exact source location, including the specific page number, for every piece of information it indexes. When a model generates a response based on retrieved documents, it can cite these exact locations. This transparency allows users to verify AI-generated answers against the original source material, building trust and aiding fact-checking.

The updated File Search features are available through the Gemini API and Google AI Studio. Developers can get started by using the genai client to initialize a multimodal file store and upload their document and image libraries. Detailed implementation instructions and code examples are provided in the official Gemini API documentation and the Google AI developer guide.

Share this update