NVIDIA Releases SANA-WM Open Source World Model for Minute-Long Video

NVIDIA

May 19, 2026 · Updated Jun 12, 2026

NVIDIA released SANA-WM, a 2.6B-parameter open-source world model that generates 60-second, 720p videos from a single image and camera path. It achieves industrial-level quality on a single GPU, enabling developers to simulate controllable environments without massive compute clusters.

NVIDIA released SANA-WM, a 2.6B-parameter open-source world model (an AI system that simulates physical environments) natively trained for minute-scale video generation. It uses a single image, text, and a 6-DoF camera trajectory (movement across six axes) to render 720p video. Hybrid Linear Attention maintains world coherence for 60 seconds.

Parameters (Backbone): 2.6B
Parameters (Refiner): 17B
Video resolution: 720p
Video duration: 60 seconds
Inference hardware: Single H100 or RTX 5090
Availability: Open source (weights and code)

This release bridges the gap between short-form clips and the Google DeepMind navigable environments required for robotics. By achieving industrial quality on a single GPU, NVIDIA is validating its NVIDIA video world model roadmap as a pretraining paradigm. It shifts focus from generation to controllable simulation that respects physical camera paths.

You can access the model weights, code, and paper immediately to build simulators or content tools. While training required 64 H100s, inference runs on a single H100. A distilled variant can denoise a 60-second clip in 34 seconds on an RTX 5090, making long-horizon modeling accessible for local development.

View the full update on nvlabs.github.io

NVIDIA AI

@NVIDIAAIMay 19

One image + text + camera trajectory = controllable worlds. All on a single GPU. Our research team just released SANA-WM, a 2.6B open source world model natively trained for 60-second video generation with precise camera control. https://t.co/oXHRCnCRdM

1811.2k

View on X

Still wondering? A few quick answers below.

SANA-WM is a 2.6B-parameter open-source world model designed to generate high-fidelity, minute-long videos. Unlike standard video generators, it acts as a simulator that turns a single starting image and a specific camera trajectory into a consistent 720p environment. It is specifically optimized to maintain visual coherence for a full 60 seconds.

The model uses a Hybrid Linear Attention mechanism that combines frame-wise Gated DeltaNet with periodic softmax attention. This allows it to handle long-context video data efficiently without running out of memory. A two-stage pipeline first generates a base 2.6B rollout, which is then enhanced by a 17B long-video refiner to improve texture and motion quality.

Yes, NVIDIA has released SANA-WM as an open-source project. The release includes the model weights for both the bidirectional variant and the long-video refiner, along with the underlying code and the original research paper. Developers can access these resources on GitHub and Hugging Face to build their own controllable video simulations.

While NVIDIA used 64 H100 GPUs to train the model over 15 days, it is designed for efficient inference on a single GPU. A standard H100 can generate a 60-second 720p clip. Additionally, a distilled version using specialized quantization can run on a consumer-grade RTX 5090, producing a one-minute video in approximately 34 seconds.

NVIDIA states that SANA-WM achieves visual quality comparable to large-scale industrial baselines like LingBot-World and HY-WorldPlay. However, it is significantly more efficient, offering up to 36 times higher throughput than prior open-source baselines. It also demonstrates superior accuracy in following precise 6-DoF camera trajectories compared to existing open-source world models.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from NVIDIA →

Keep reading

Runway Unveils Real-Time Video Model Built with NVIDIA Hardware

Runway shared a research preview of a real-time video generation model developed with NVIDIA, running on Vera Rubin hardware. HD video generates instantly — time-to-first-frame under 100ms — opening a fundamentally different design space for video generation and world simulation.

NVIDIAJun 1

NVIDIA releases Cosmos 3 open weights to unify physical reasoning and world generation

NVIDIA released Cosmos 3, an open-weights foundation model family designed for physical AI applications like robotics and autonomous driving. By unifying vision reasoning and world simulation into a single architecture, the model allows developers to build autonomous systems that understand physical laws and predict future states within one workflow.

Hao AI LabMay 27

Hao AI Lab Open Sources Dreamverse for Real Time Video Directing

Hao AI Lab released Dreamverse, an open-source reference application that generates 30-second 1080p videos in 7 seconds on a single NVIDIA B200 GPU. The system introduces vibe directing, a workflow where creators steer video generation through natural language in a real-time interactive loop.

NVIDIA Cosmos 3 takes top open weights rank with agentic reasoning

Artificial AnalysisJun 1

NVIDIA Cosmos 3 takes top open weights rank with agentic reasoning

NVIDIA's Cosmos 3 Super models have reached #1 on the Artificial Analysis open-weights leaderboards for both image and video generation. The system uses a reasoning-based architecture to refine prompts before generating high-fidelity visual content.

What is NVIDIA SANA-WM?

How does the SANA-WM architecture work?

Is SANA-WM open source and available to the public?

What are the hardware requirements to run SANA-WM?

How does SANA-WM compare to other video models?

Keep reading

Runway Unveils Real-Time Video Model Built with NVIDIA Hardware

Runway Unveils Real-Time Video Model Built with NVIDIA Hardware

NVIDIA releases Cosmos 3 open weights to unify physical reasoning and world generation

NVIDIA releases Cosmos 3 open weights to unify physical reasoning and world generation

Hao AI Lab Open Sources Dreamverse for Real Time Video Directing

Hao AI Lab Open Sources Dreamverse for Real Time Video Directing

NVIDIA Cosmos 3 takes top open weights rank with agentic reasoning

NVIDIA Cosmos 3 takes top open weights rank with agentic reasoning

Keep reading

Runway Unveils Real-Time Video Model Built with NVIDIA Hardware

Runway Unveils Real-Time Video Model Built with NVIDIA Hardware

NVIDIA releases Cosmos 3 open weights to unify physical reasoning and world generation

NVIDIA releases Cosmos 3 open weights to unify physical reasoning and world generation

Hao AI Lab Open Sources Dreamverse for Real Time Video Directing

Hao AI Lab Open Sources Dreamverse for Real Time Video Directing

NVIDIA Cosmos 3 takes top open weights rank with agentic reasoning

NVIDIA Cosmos 3 takes top open weights rank with agentic reasoning