NVIDIA Outlines Technical Roadmap for Scaling Robot Dexterity and Physical AGI

NVIDIA

May 9, 2026 · Updated Jun 7, 2026

NVIDIA's Jim Fan presented Robotics: Endgame at Sequoia AI Ascent, a 20-minute technical roadmap for solving Physical AGI as a parallel to the LLM success story. He walks through why current VLAs fall short, the case for video world models as a second pretraining paradigm, World Action Models, EgoScale, a Dexterity Scaling Law, and DreamDojo, an end-to-end neural physics engine for scaling reinforcement learning in silico.

NVIDIA's Jim Fan delivered Robotics: Endgame, a 20-minute talk at Sequoia AI Ascent that lays out a roadmap for Physical AGI as a deliberate parallel to the LLM success story. The talk is positioned as the sequel to his Physical Turing Test talk from a year earlier at the same conference.

Talk title: Robotics: Endgame
Speaker: Jim Fan (NVIDIA)
Venue: Sequoia AI Ascent
Length: ~20 minutes
Key concepts introduced: World Action Models, EgoScale, Dexterity Scaling Law, DreamDojo, Physical RL

The roadmap walks through several pieces in order: why current Vision-Language-Action models (VLAs) fall short, video world models as a second pretraining paradigm, World Action Models (WAM), strategies for robot data collection and an FSD-style physical data flywheel for robot manipulation, EgoScale and a newly discovered Dexterity Scaling Law, Physical RL as the last-mile step, and DreamDojo — an end-to-end neural physics engine for scaling reinforcement learning in silico.

You can watch the full talk on YouTube via the link in Fan's announcement tweet. The chapter markers in the tweet map directly to the segments above, so viewers can jump straight to a specific argument — DreamDojo at 15:39, World Action Models at 06:09, or the Civilizational Technology Tree predictions near the end of the talk.

View the full update on youtube.com

Jim Fan

@DrJimFanMay 8

I promise this will be the best 20 min you spend today! Robotics: Endgame, the sequel to my last year's Sequoia AI Ascent talk, "Physical Turing Test". I laid out the roadmap for solving Physical AGI as a simple parallel to the LLM success story. Be a good scientist, copy homework ;) And stay till the end, more easter eggs and predictions for your polymarket! 00:30 DGX-1 origin story at OpenAI, I was there in 2016 signing with Jensen and Elon. Heading to the Computer History Museum! 01:42 The Great Parallel 03:31 Robotics, the Endgame 03:39 Why VLAs fall short 04:32 Video world models as the 2nd pretraining paradigm 06:09 World Action Models (WAM) 07:46 Strategies for robot data collection and the FSD equivalent to physical data flywheel for robot manipulation 11:06 EgoScale and the Dexterity Scaling Law we discovered recently 14:00 Physical RL: bridging the last mile 15:39 DreamDojo: an end-to-end neural physics engine for scaling RL in silico 17:00 Civilizational Technology Tree and my predictions for the near future. Spoiler: it's closer than you think. Thanks to my friends at Sequoia for inviting me back to AI Ascent this year! I had a blast! Last year's talk is attached in the thread if you missed it.

5153.2k

View on X

Still wondering? A few quick answers below.

Robotics: Endgame is a roadmap talk delivered by NVIDIA's Jim Fan at Sequoia AI Ascent. It lays out his view of how Physical AGI gets solved, structured as a parallel to the LLM success story. The talk is the sequel to his earlier Physical Turing Test talk at the same conference a year prior, and runs about 20 minutes.

World Action Models (WAM) is the paradigm Jim Fan introduces in Robotics: Endgame as the next step beyond current Vision-Language-Action models for robotics. The chapter on WAM appears around the 6-minute mark of the talk, after a section explaining why current VLAs fall short and a section on video world models as a second pretraining paradigm.

DreamDojo is described in the talk as an end-to-end neural physics engine for scaling reinforcement learning in silico — that is, in simulation rather than on physical robots. Jim Fan covers DreamDojo around the 15:39 mark of the Robotics: Endgame talk, framing it as the infrastructure layer that makes Physical RL scaling tractable.

Jim Fan describes EgoScale and a Dexterity Scaling Law as a recently discovered result, presented around the 11-minute mark. The framing in the tweet is that dexterity itself can be scaled like other capabilities once you have the right data collection strategy and a physical data flywheel comparable to what FSD uses for autonomous driving.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from NVIDIA →

Keep reading

NVIDIA launches CaP-X to turn frontier language models into autonomous robot controllers

NVIDIA and academic partners released CaP-X, an open-source framework that allows large language models to control robots by writing and executing code. It proves that off-the-shelf models like Gemini 3 Pro can perform complex physical tasks without specific robotics training. This shifts the robotics paradigm from specialized end-to-end models to general-purpose agentic reasoning.

Google DeepMindFeb 25

Google DeepMind Researchers Explain How World Models Create Navigable Environments

Google DeepMind explains world models through Project Genie: they simulate environments moment-by-moment as an agent acts, not just predicting text. A single image generates a navigable world — objects respond, rooms are walkable — without any game engine.

RunwayMar 20

Runway Unveils Real-Time Video Model Built with NVIDIA Hardware

Runway shared a research preview of a real-time video generation model developed with NVIDIA, running on Vera Rubin hardware. HD video generates instantly — time-to-first-frame under 100ms — opening a fundamentally different design space for video generation and world simulation.

Kimi Reveals How It Scaled K2.5 at NVIDIA GTC 2026

KimiMar 21

Kimi Reveals How It Scaled K2.5 at NVIDIA GTC 2026

Kimi CEO Zhilin Yang detailed the training innovations behind Kimi K2.5 at NVIDIA GTC 2026. The session covers the Muon optimizer replacing Adam to double token learning efficiency, AI-native training, and a shift toward linear attention for longer-running agents.

What is Robotics: Endgame?

What is a World Action Model?

What is DreamDojo?

What is the Dexterity Scaling Law mentioned in the talk?

Keep reading

NVIDIA launches CaP-X to turn frontier language models into autonomous robot controllers

NVIDIA launches CaP-X to turn frontier language models into autonomous robot controllers

Google DeepMind Researchers Explain How World Models Create Navigable Environments

Google DeepMind Researchers Explain How World Models Create Navigable Environments

Runway Unveils Real-Time Video Model Built with NVIDIA Hardware

Runway Unveils Real-Time Video Model Built with NVIDIA Hardware

Kimi Reveals How It Scaled K2.5 at NVIDIA GTC 2026

Kimi Reveals How It Scaled K2.5 at NVIDIA GTC 2026

Keep reading

NVIDIA launches CaP-X to turn frontier language models into autonomous robot controllers

NVIDIA launches CaP-X to turn frontier language models into autonomous robot controllers

Google DeepMind Researchers Explain How World Models Create Navigable Environments

Google DeepMind Researchers Explain How World Models Create Navigable Environments

Runway Unveils Real-Time Video Model Built with NVIDIA Hardware

Runway Unveils Real-Time Video Model Built with NVIDIA Hardware

Kimi Reveals How It Scaled K2.5 at NVIDIA GTC 2026

Kimi Reveals How It Scaled K2.5 at NVIDIA GTC 2026