NVIDIA Ships Nemotron 3 Ultra for 5x Faster, Cheaper AI Agents

NVIDIANVIDIA

NVIDIA has shipped Nemotron 3 Ultra, a 550B Mixture-of-Experts (MoE) open model designed for long-running AI agents. This model delivers 5x faster inference and up to 30% lower cost for complex agentic tasks compared to other open frontier models, aiming to make autonomous workflows more efficient and accessible.

NVIDIA has shipped Nemotron 3 Ultra, a 550B-parameter Mixture-of-Experts (MoE) model with 55B active parameters, built for long-running AI agents. This open model delivers 5x faster inference (running a trained AI model) and up to 30% lower cost for complex agentic tasks.
Total Parameters
550B
Active Parameters
55B
Inference Speed
5x faster vs. other open frontier models
Cost Reduction
Up to 30% lower for agentic tasks
Architecture
Hybrid Mamba-Transformer MoE
Licensing
OpenMDW-1.1

Its hybrid Mamba-Transformer MoE architecture enables more reasoning cycles within the same time budget, addressing multi-agent workflow challenges. It achieves leading accuracy for agent productivity, coding, and long-horizon planning, supporting large codebases and synthesizing hundreds of sources.

Nemotron 3 Ultra is fully open, including weights and training recipes, and is post-trained for popular agent harnesses. It is available on Hugging Face and various inference platforms, alongside new Nemotron 3.5 Content Safety and Nemotron 3.5 ASR models for guardrails and multilingual voice.

NVIDIA AI
NVIDIA AI
@NVIDIAAI
X

Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models. https://t.co/FEXqvfzQFO

461retweets3.4klikes
View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update