NVIDIA Ships Nemotron 3 Ultra for 5x Faster, Cheaper AI Agents

NVIDIA

Jun 4, 2026 · Updated Jun 12, 2026

NVIDIA has shipped Nemotron 3 Ultra, a 550B Mixture-of-Experts (MoE) open model designed for long-running AI agents. This model delivers 5x faster inference and up to 30% lower cost for complex agentic tasks compared to other open frontier models, aiming to make autonomous workflows more efficient and accessible.

NVIDIA has shipped Nemotron 3 Ultra, a 550B-parameter Mixture-of-Experts (MoE) model with 55B active parameters, built for long-running AI agents. This open model delivers 5x faster inference (running a trained AI model) and up to 30% lower cost for complex agentic tasks.

Total Parameters: 550B
Active Parameters: 55B
Inference Speed: 5x faster vs. other open frontier models
Cost Reduction: Up to 30% lower for agentic tasks
Architecture: Hybrid Mamba-Transformer MoE
Licensing: OpenMDW-1.1

Its hybrid Mamba-Transformer MoE architecture enables more reasoning cycles within the same time budget, addressing multi-agent workflow challenges. It achieves leading accuracy for agent productivity, coding, and long-horizon planning, supporting large codebases and synthesizing hundreds of sources.

Nemotron 3 Ultra is fully open, including weights and training recipes, and is post-trained for popular agent harnesses. It is available on Hugging Face and various inference platforms, alongside new Nemotron 3.5 Content Safety and Nemotron 3.5 ASR models for guardrails and multilingual voice.

View the full update on developer.nvidia.com

NVIDIA AI

@NVIDIAAIJun 4

Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models. https://t.co/FEXqvfzQFO

4623.4k

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from NVIDIA →

Keep reading

NVIDIA Nemotron 3 Ultra Powers Faster, Cheaper Reasoning for AI Agents

NVIDIA has released Nemotron 3 Ultra, an open model designed for long-running AI agents, and provided a tutorial for its setup and demonstrations. This model aims to make complex, multi-step agentic workflows faster and more cost-effective by delivering high throughput and efficient reasoning.

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

OllamaJun 7

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Ollama has made NVIDIA's Nemotron 3 Ultra model available on its cloud. This 550 billion parameter Mixture of Experts (MoE) model is designed for long-running AI agents, delivering 5x faster inference and up to 30% lower costs for complex agentic tasks.

LangChain Adds NVIDIA Nemotron 3 Ultra for Faster AI Agents

LangChainJun 7

LangChain Adds NVIDIA Nemotron 3 Ultra for Faster AI Agents

LangChain announced immediate support for NVIDIA Nemotron 3 Ultra, an open frontier model designed for long-running AI agents. This integration makes the model's 5x faster inference and up to 30% lower cost for complex agentic tasks directly available to developers using the LangChain framework.

NVIDIA Nemotron 3 Ultra Claims Top US Open Weights Intelligence Spot

Artificial AnalysisJun 1

NVIDIA Nemotron 3 Ultra Claims Top US Open Weights Intelligence Spot

NVIDIA released Nemotron 3 Ultra, a 550B-parameter model that leads US open-weights benchmarks with an intelligence score of 48. The model delivers high-throughput performance exceeding 300 tokens per second, significantly outpacing similarly sized frontier models from China.