Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Ollama

Jun 7, 2026 · Updated Jun 12, 2026

Ollama has made NVIDIA's Nemotron 3 Ultra model available on its cloud. This 550 billion parameter Mixture of Experts (MoE) model is designed for long-running AI agents, delivering 5x faster inference and up to 30% lower costs for complex agentic tasks.

Ollama now hosts NVIDIA Nemotron 3 Ultra, a 550 billion parameter (55B active) open model, on its cloud. This Mixture of Experts (MoE) (an architecture using multiple specialized sub-networks) model is built for high-throughput reasoning and long-running agent workflows, supporting a 1 million token context window.

Cloud Model: nemotron-3-ultra:cloud
Agentic Tools: Claude Code, Hermes Agent, OpenClaw
Access: ollama launch (agents), ollama run (chat)

The model is optimized for NVIDIA's 4-bit floating point format (NVFP4), enabling 5x faster inference and up to 30% lower costs for complex agentic tasks compared to other open frontier models. Nemotron 3 Ultra leads on agent productivity, instruction following, and long-context benchmarks.

Nemotron 3 Ultra is accessible via Ollama's cloud for general chat and integrates with agentic coding tools like Claude Code, Hermes Agent, and OpenClaw. You can launch these applications with the model using simple ollama launch commands, enabling access to advanced agentic capabilities via Ollama's cloud.

View the full update on ollama.com

ollama

@ollamaJun 4

NVIDIA’s Nemotron 3 Ultra is available on Ollama’s cloud! Try it 👇 Claude Code: ollama launch claude --model nemotron-3-ultra:cloud Hermes Agent: ollama launch hermes --model nemotron-3-ultra:cloud OpenClaw: ollama launch openclaw --model nemotron-3-ultra:cloud General chat: ollama run nemotron-3-ultra:cloud Model page 👇👇👇

49554

View on X

Still wondering? A few quick answers below.

NVIDIA Nemotron 3 Ultra is a 550 billion parameter open model, with 55 billion active parameters, designed for high-throughput reasoning and long-running AI agent workflows. It uses a Mixture of Experts architecture and supports a 1 million token context window.

The model is optimized for NVIDIA's 4-bit floating point format, which allows for 5x faster inference and up to 30% lower costs for complex agentic tasks compared to other open frontier models. It performs strongly on agent productivity, instruction following, and long-context benchmarks.

NVIDIA Nemotron 3 Ultra is available on Ollama's cloud. You can use it for general chat or integrate it with agentic coding tools like Claude Code, Hermes Agent, and OpenClaw by using the ollama launch command.

Nemotron 3 Ultra supports a 1 million token context window. This allows it to process entire codebases, long tool histories, and extensive research trails without losing context during multi-step agentic workflows.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Ollama →

Keep reading

NVIDIA Ships Nemotron 3 Ultra for 5x Faster, Cheaper AI Agents

NVIDIA has shipped Nemotron 3 Ultra, a 550B Mixture-of-Experts (MoE) open model designed for long-running AI agents. This model delivers 5x faster inference and up to 30% lower cost for complex agentic tasks compared to other open frontier models, aiming to make autonomous workflows more efficient and accessible.

LangChain Adds NVIDIA Nemotron 3 Ultra for Faster AI Agents

LangChainJun 7

LangChain Adds NVIDIA Nemotron 3 Ultra for Faster AI Agents

LangChain announced immediate support for NVIDIA Nemotron 3 Ultra, an open frontier model designed for long-running AI agents. This integration makes the model's 5x faster inference and up to 30% lower cost for complex agentic tasks directly available to developers using the LangChain framework.

PerplexityJun 5

Perplexity Adds NVIDIA Nemotron 3 Ultra for Faster Agentic AI

Perplexity has made NVIDIA's Nemotron 3 Ultra model available to its Pro and Max subscribers on its platform and Perplexity Computer. This open model is designed for long-running AI agents, offering faster task completion and reduced costs for complex, multi-step workflows.

Vercel AI Gateway Adds NVIDIA Nemotron 3 Ultra for Cost-Efficient Agents

VercelJun 4

Vercel AI Gateway Adds NVIDIA Nemotron 3 Ultra for Cost-Efficient Agents

Vercel has integrated NVIDIA Nemotron 3 Ultra into its AI Gateway, making the large US open-weight model available for developers. This provides a cost-effective option for building and orchestrating complex, multi-step AI agent workflows, with up to 30% lower costs for agentic tasks.

What is NVIDIA Nemotron 3 Ultra?

How does Nemotron 3 Ultra improve AI agent performance?

How can I access Nemotron 3 Ultra on Ollama?

What is the context window for Nemotron 3 Ultra?

Keep reading

NVIDIA Ships Nemotron 3 Ultra for 5x Faster, Cheaper AI Agents

NVIDIA Ships Nemotron 3 Ultra for 5x Faster, Cheaper AI Agents

LangChain Adds NVIDIA Nemotron 3 Ultra for Faster AI Agents

LangChain Adds NVIDIA Nemotron 3 Ultra for Faster AI Agents

Perplexity Adds NVIDIA Nemotron 3 Ultra for Faster Agentic AI

Perplexity Adds NVIDIA Nemotron 3 Ultra for Faster Agentic AI

Vercel AI Gateway Adds NVIDIA Nemotron 3 Ultra for Cost-Efficient Agents

Vercel AI Gateway Adds NVIDIA Nemotron 3 Ultra for Cost-Efficient Agents

Keep reading

NVIDIA Ships Nemotron 3 Ultra for 5x Faster, Cheaper AI Agents

NVIDIA Ships Nemotron 3 Ultra for 5x Faster, Cheaper AI Agents

LangChain Adds NVIDIA Nemotron 3 Ultra for Faster AI Agents

LangChain Adds NVIDIA Nemotron 3 Ultra for Faster AI Agents

Perplexity Adds NVIDIA Nemotron 3 Ultra for Faster Agentic AI

Perplexity Adds NVIDIA Nemotron 3 Ultra for Faster Agentic AI

Vercel AI Gateway Adds NVIDIA Nemotron 3 Ultra for Cost-Efficient Agents

Vercel AI Gateway Adds NVIDIA Nemotron 3 Ultra for Cost-Efficient Agents