NVIDIA Nemotron 3 Ultra Powers Faster, Cheaper Reasoning for AI Agents

NVIDIA

Jun 5, 2026 · Updated Jun 20, 2026

NVIDIA has released Nemotron 3 Ultra, an open model designed for long-running AI agents, and provided a tutorial for its setup and demonstrations. This model aims to make complex, multi-step agentic workflows faster and more cost-effective by delivering high throughput and efficient reasoning.

NVIDIA has released Nemotron 3 Ultra, an open 550B-parameter Mixture-of-Experts (MoE) model with 55B active parameters, optimized for agent orchestration and reasoning in long-running AI agent workflows. It is post-trained for agent harnesses, enabling multi-turn planning, tool use, and error recovery, integrating Hybrid Mamba-Transformer layers for efficient long-context handling.

Total Parameters: 550B
Active Parameters: 55B
Inference Throughput: Up to 5x higher
Cost Reduction for Agentic Tasks: Up to 30%
Licensing: OpenMDW-1.1

This model addresses challenges in multi-agent systems where token counts and costs grow quickly. Nemotron 3 Ultra delivers up to 5x higher throughput and can lower the cost for agentic tasks by up to 30% compared to other open models, making autonomous workflows more efficient.

Nemotron 3 Ultra ships with open weights, data, and recipes under the OpenMDW-1.1 license, the work of the NVIDIA Nemotron Coalition. It's already on Perplexity Pro, OpenRouter, and Hugging Face, and plugs into agent frameworks like Hermes Agent and OpenCode.

View the full update on developer.nvidia.com

NVIDIA AI

@NVIDIAAIJun 5

Nemotron 3 Ultra is here, and we've got just the tutorial to get you going. Here's how to set up Ultra in your favorite agentic harness + some great demos of the model's capabilities 👇 https://t.co/jylnpBMyj3

17122

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from NVIDIA →

Keep reading

NVIDIA Ships Nemotron 3 Ultra for 5x Faster, Cheaper AI Agents

NVIDIA has shipped Nemotron 3 Ultra, a 550B Mixture-of-Experts (MoE) open model designed for long-running AI agents. This model delivers 5x faster inference and up to 30% lower cost for complex agentic tasks compared to other open frontier models, aiming to make autonomous workflows more efficient and accessible.

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

OllamaJun 7

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Ollama has made NVIDIA's Nemotron 3 Ultra model available on its cloud. This 550 billion parameter Mixture of Experts (MoE) model is designed for long-running AI agents, delivering 5x faster inference and up to 30% lower costs for complex agentic tasks.

LangChain Adds NVIDIA Nemotron 3 Ultra for Faster AI Agents

LangChainJun 7

LangChain Adds NVIDIA Nemotron 3 Ultra for Faster AI Agents

LangChain announced immediate support for NVIDIA Nemotron 3 Ultra, an open frontier model designed for long-running AI agents. This integration makes the model's 5x faster inference and up to 30% lower cost for complex agentic tasks directly available to developers using the LangChain framework.

PerplexityJun 5

Perplexity Adds NVIDIA Nemotron 3 Ultra for Faster Agentic AI

Perplexity has made NVIDIA's Nemotron 3 Ultra model available to its Pro and Max subscribers on its platform and Perplexity Computer. This open model is designed for long-running AI agents, offering faster task completion and reduced costs for complex, multi-step workflows.