Hours of video, now searchable by your agent. We just released a new set of agent skills and modular architecture for the Metropolis Blueprint for Video Search and Summarization, eliminating the need for manual configuration of multiple microservices. Load the skills into a compatible coding agent and it deploys the stack, turning hours of footage into searchable, actionable intelligence through a chat interface. Ask in plain language and get back clips, summaries, and answers.
NVIDIA Releases VSS Agent Skills to Automate Industrial Video Analytics
NVIDIA updated its Metropolis Blueprint for Video Search and Summarization (VSS) with a modular architecture and new agent skills. These skills follow the
agentskills.io specification, allowing coding agents to self-install the stack. This follows the Metropolis VSS 3 Blueprint launch by removing the manual microservice configuration previously required.- Max concurrent streams (H100)
- 33
- Max concurrent streams (RTX PRO 6000)
- 51
- Ingestion latency (H100)
- 0.079 seconds
- Ingestion latency (RTX PRO 6000)
- 0.101 seconds
- Retrieval latency (H100)
- 2.24 seconds
A new profile system lets developers layer workflows like real-time alerts onto a base agent. This is powered by fusion search, which decomposes complex natural language queries into sub-queries. By searching across multiple embedding types, the system improves precision when finding specific events in massive video archives.
You can deploy the VSS Search profile via chat prompts to agents like Codex. The system supports H100 and RTX PRO 6000 GPUs, with ingestion latencies under 0.1 seconds. These capabilities join the NVIDIA AI-Q Agent Skill's new feature to expand the portable tools available to autonomous coding agents.
NVIDIA AI
@NVIDIAAI
48retweets393likes
View on XStill wondering? A few quick answers below.
VSS agent skills are portable capabilities for the NVIDIA Metropolis Blueprint for Video Search and Summarization that follow the agentskills.io specification. These skills allow autonomous coding agents like Codex or OpenClaw to understand how to deploy, configure, and operate the VSS microservice stack through a simple natural language chat interface instead of manual setup.
The system uses a modular architecture and vision-language models to perform agentic search. It decomposes complex natural language queries into sub-queries and uses a fusion search capability to scan multiple embedding types. This process allows the AI to locate specific objects, actions, or safety events across massive volumes of live or recorded video data.
NVIDIA VSS is supported on several hardware configurations to meet different performance needs. Benchmarks provided by NVIDIA show the agentic search and alert verification workflows running on H100 and RTX PRO 6000 GPUs. The system is also compatible with DGX Spark and AGX Thor setups for specific tasks like alert verification and video summarization.
On a single H100 GPU, the agentic search workflow can handle up to 33 concurrent input streams with an ingestion latency of 0.079 seconds. When a user performs a search, the retrieval latency to receive a result is approximately 2.24 seconds. These metrics vary depending on the specific developer profile and hardware topology used.
Developers can access VSS skills through the NVIDIA VSS GitHub repository. To use them, you need a system prepared to run VSS, such as an NVIDIA Brev Launchable instance, and a compatible coding agent like Codex or Claude Code. Once the skills are loaded, the agent can autonomously manage the deployment of containers and environment variables.


