NVIDIA's Cosmos 3 lands at #1 among open weights models in both Text to Image and Image to Video on the Artificial Analysis Leaderboards! Cosmos 3 is a family of omnimodal world models for Physical AI from @nvidia, unifying language, image, video, audio and action in a single Mixture-of-Transformers architecture that pairs an autoregressive reasoner with a diffusion generator. The family comes in four variants: base Nano (16B: 8B reasoner tower + 8B generator tower) and Super (64B: 32B reasoner tower + 32B generator tower) models, with the Super model also having Text2Image and Image2Video fine-tuned variants, which are the versions listed in the Artificial Analysis Arena Leaderboards. Cosmos3-Super-Text2Image (agentic) runs through an agentic prompt-upsampling harness, and takes the #1 open weights spot in Text to Image, surpassing HiDream-O1-Image-Dev-2604, Alibaba's Qwen Image Max 2512 and Black Forest Labs' FLUX.2 [dev]. Cosmos3-Super-Image2Video takes #1 open weights in Image to Video (No Audio), ahead of Lightricks' LTX-2, and Alibaba's Wan 2.2 A14B. Cosmos 3 generators take structured JSON prompts rather than plain text, so prompt upsampling is needed to reproduce these results. This upsampling can be handled by an external harness or by the model's own reasoner branch, so it can also run self-contained. Cosmos 3 is fully open under the OpenMDW 1.1 license, shipping with weights, code, curated datasets and fine-tuning recipes available on @huggingface. First-party and third-party APIs are expected over the next few weeks, with pricing to follow. See the thread below for example generations and a link to try Cosmos 3 in our arena 🧵
NVIDIA Cosmos 3 takes top open weights rank with agentic reasoning
Cosmos 3 Super uses a Mixture-of-Transformers architecture—combining a reasoner with a diffusion generator (a model that creates data by reversing a noise process). This unifies language, vision, and action.- Model
- Cosmos 3 Super
- Image2Video Elo
- 1,255
- Parameters
- 64B
- License
- OpenMDW 1.1
- Variants
- Nano (16B) and Super (64B)
This unseats the HiDream-O1-Image-Dev-2604 analysis in the open-weights image category. While proprietary systems like the Grok-Imagine-Video-1.5 ranking maintain the overall lead, NVIDIA's release under the OpenMDW-1.1 license provides a high-performance alternative for local use. This performance tracks alongside the Nemotron 3 Ultra analysis, which leads open-weights intelligence benchmarks.
Weights and code are available on Hugging Face. The system uses agentic prompt-upsampling (using AI to expand simple instructions into detailed technical prompts) to handle the model's required JSON input format. First-party and third-party APIs are expected to launch in the coming weeks.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →




