One image + text + camera trajectory = controllable worlds. All on a single GPU. Our research team just released SANA-WM, a 2.6B open source world model natively trained for 60-second video generation with precise camera control. https://t.co/oXHRCnCRdM
NVIDIA Releases SANA-WM Open Source World Model for Minute-Long Video
NVIDIA· Updated
NVIDIA released SANA-WM, a 2.6B-parameter open-source world model that generates 60-second, 720p videos from a single image and camera path. It achieves industrial-level quality on a single GPU, enabling developers to simulate controllable environments without massive compute clusters.
- Parameters (Backbone)
- 2.6B
- Parameters (Refiner)
- 17B
- Video resolution
- 720p
- Video duration
- 60 seconds
- Inference hardware
- Single H100 or RTX 5090
- Availability
- Open source (weights and code)
This release bridges the gap between short-form clips and the Google DeepMind navigable environments required for robotics. By achieving industrial quality on a single GPU, NVIDIA is validating its NVIDIA video world model roadmap as a pretraining paradigm. It shifts focus from generation to controllable simulation that respects physical camera paths.
You can access the model weights, code, and paper immediately to build simulators or content tools. While training required 64 H100s, inference runs on a single H100. A distilled variant can denoise a 60-second clip in 34 seconds on an RTX 5090, making long-horizon modeling accessible for local development.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

