HeadsUpAI

NVIDIA releases Cosmos 3 open weights to unify physical reasoning and world generation

NVIDIA released Cosmos 3, an open-weights foundation model family for physical AI. It unifies vision reasoning, world simulation, and action generation. The release features Cosmos 3 Super, a 64B parameter model for datacenter deployment, and Cosmos 3 Nano, a 16B parameter model optimized for edge deployment on workstation-grade GPUs.
Model Variants
Cosmos 3 Super (64B), Cosmos 3 Nano (16B)
Architecture
Mixture-of-Transformers (MoT)
Quantization
BF16, FP8, NVFP4 (4-bit)
Datasets
6 synthetic datasets (Robotics, Driving, Warehouse, and others)
Benchmarks
VANTAGE-Bench, PAI-Bench, Physics-IQ

Physical AI often requires orchestrating separate models for perception and generation. Cosmos 3 unifies these by pairing a reasoner tower with a generator tower to predict future states. It extends the Cosmos 3 launch and tracks alongside Artificial Analysis's performance ranking where the model leads in open-weights generation.

Weights are available on Hugging Face with scripts on GitHub. The release includes six synthetic datasets for post-training—adapting a pre-trained model to a specific task. For production, NVIDIA provides NIM microservices with 4-bit quantization for optimized deployment on Hopper and Blackwell GPUs.

NVIDIA AI
NVIDIA AI
@NVIDIAAI
X

Introducing Cosmos 3: Our latest frontier model for Physical AI Cosmos 3 is the world’s first fully open omnimodel with native vision reasoning, world and action generation. Today we’re releasing Super (32B) and Nano (8B) variants. https://t.co/6UfkSA7kzQ

404retweets2.7klikes
View on X

Still wondering? A few quick answers below.

NVIDIA Cosmos 3 is an open-weights foundation model family designed for Physical AI. It unifies vision reasoning, world simulation, and action generation into a single architecture, allowing robots and autonomous vehicles to perceive their environment and predict future states within a single inference loop.

The Mixture-of-Transformers (MoT) architecture unifies reasoning and generation by pairing two specialized towers. An autoregressive reasoner tower interprets multimodal observations like video and text, while a diffusion-based generator tower produces physics-aware video and action sequences conditioned on that reasoning.

Cosmos 3 Super is a 64B parameter model designed for high-quality reasoning and large-scale synthetic data generation in datacenters. Cosmos 3 Nano is a 16B parameter version optimized for efficient inference on workstation-grade GPUs, making it suitable for real-time robotics applications.

Developers can access model weights on Hugging Face and training scripts on GitHub. For production deployment, NVIDIA provides NIM microservices that include optimizations like 4-bit quantization and efficient video sampling to accelerate inference on Hopper and Blackwell GPUs.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update