NVIDIA launches CaP-X to turn frontier language models into autonomous robot controllers

NVIDIA

Apr 1, 2026 · Updated Apr 25, 2026

NVIDIA and academic partners released CaP-X, an open-source framework that allows large language models to control robots by writing and executing code. It proves that off-the-shelf models like Gemini 3 Pro can perform complex physical tasks without specific robotics training. This shifts the robotics paradigm from specialized end-to-end models to general-purpose agentic reasoning.

NVIDIA released CaP-X, an open-source ecosystem for agentic robotics. It includes CaP-Agent0, a training-free harness that lets models like Gemini 3 Pro or GPT-5.2 control hardware via perception and actuation APIs. The system also features CaP-Gym, a benchmark covering 187 physical manipulation tasks across simulation and reality.

This framework challenges the belief that robots require end-to-end training. While traditional Vision-Language-Action models often fail when environments change, CaP-X agents use high-level reasoning to adapt. Benchmarks show frontier models already achieve over 30% success zero-shot, significantly outperforming specialized models on perturbed tasks.

You can use CaP-RL to improve performance; a 7B Qwen 2.5 Coder model jumped from 20% to 72% success after 50 iterations. The framework automatically synthesizes successful code into persistent skill libraries for reuse. All components are available under an MIT license for immediate deployment.

View the full update on capgym.github.io

Jim Fan

@DrJimFanApr 1

The power of the Claw, in the palm of a robot hand. Agentic robotics is here! Today, we open-source CaP-X: vibe agents, alive in the physical world. They incarnate as robot arms and humanoids with a rich set of perception APIs, actuation APIs, and auto synthesize skill libraries as they go. CaP-X is a strict superset of our old stack, because policies like VLAs are “just” API calls as well. It solves many tasks zero-shot that a learned policy would struggle with. And we are doing much more than vibing. CaP-X is our most systematic, scientific study on agentic robotics so far: - We build a comprehensive agentic toolkit: perception (SAM3 segmentation, Molmo pointing, depth, point cloud), control (IK solvers, grasp planner, navigation), and visualization (EEF, mask overlays) that work across different robots. - CaP-Gym: LLM’s first Physical Exam! 187 manipulation tasks across RoboSuite, LIBERO-PRO, and BEHAVIOR. Tabletop, bimanual, mobile manipulation. Sim and real. Can’t wait to see the gradients flow from CaP-Gym to the next wave of frontier LLM releases. - CaP-Bench: we benchmark 12 frontier LLMs/VLMs (Gemini, GPT, Opus, Qwen, DeepSeek, Kimi, and more) across 8 evaluation tiers. We systematically vary API abstraction level, agentic harness, and visual grounding methods. Lots of insights in our paper. - CaP-Agent0: a training-free agentic harness that matches or exceeds human expert code on 4 out of 7 tasks without task-specific tuning. - CaP-RL: if you get a gym, you get RL ;). A 7B OSS model jumps from 20% to 72% success after only 50 training iterations. The synthesized programs transfer to real robots with minimal sim-to-real gap. 3 years ago, our team created Voyager, one of the earliest agentic AI that plays and learns in Minecraft continuously. Its key ideas — skill libraries, self-reflection loops, and in-context planning — have since influenced many modern agentic designs. Today, the agent graduates from Minecraft and gets a real job. It’s April Fool’s, but this Claw is getting its hands dirty for real! Link in thread:

63367

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from NVIDIA →

Keep reading

NVIDIA Open Sources Physical AI Agent Skills for Robotics and Manufacturing

NVIDIA released a collection of open-source tools and skills on GitHub that allow AI agents to orchestrate complex physical-world workflows. By making libraries like Omniverse and Isaac agent-callable, the release enables coding agents to autonomously handle data generation, simulation, and deployment for robots and autonomous vehicles.