Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

OllamaOllama

Ollama has made NVIDIA's Nemotron 3 Ultra model available on its cloud. This 550 billion parameter Mixture of Experts (MoE) model is designed for long-running AI agents, delivering 5x faster inference and up to 30% lower costs for complex agentic tasks.

Ollama now hosts NVIDIA Nemotron 3 Ultra, a 550 billion parameter (55B active) open model, on its cloud. This Mixture of Experts (MoE) (an architecture using multiple specialized sub-networks) model is built for high-throughput reasoning and long-running agent workflows, supporting a 1 million token context window.
Cloud Model
nemotron-3-ultra:cloud
Agentic Tools
Claude Code, Hermes Agent, OpenClaw
Access
ollama launch (agents), ollama run (chat)

The model is optimized for NVIDIA's 4-bit floating point format (NVFP4), enabling 5x faster inference and up to 30% lower costs for complex agentic tasks compared to other open frontier models. Nemotron 3 Ultra leads on agent productivity, instruction following, and long-context benchmarks.

Nemotron 3 Ultra is accessible via Ollama's cloud for general chat and integrates with agentic coding tools like Claude Code, Hermes Agent, and OpenClaw. You can launch these applications with the model using simple ollama launch commands, enabling access to advanced agentic capabilities via Ollama's cloud.

Comparative performance benchmark of Nemotron 3 Ultra against GLM 5.1, Kimi K2.6, and Qwen3.5 across seven key metrics.
ollama
ollama
@ollama
X

NVIDIA’s Nemotron 3 Ultra is available on Ollama’s cloud! Try it 👇 Claude Code: ollama launch claude --model nemotron-3-ultra:cloud Hermes Agent: ollama launch hermes --model nemotron-3-ultra:cloud OpenClaw: ollama launch openclaw --model nemotron-3-ultra:cloud General chat: ollama run nemotron-3-ultra:cloud Model page 👇👇👇

49retweets554likes
View on X

Still wondering? A few quick answers below.

NVIDIA Nemotron 3 Ultra is a 550 billion parameter open model, with 55 billion active parameters, designed for high-throughput reasoning and long-running AI agent workflows. It uses a Mixture of Experts architecture and supports a 1 million token context window.

The model is optimized for NVIDIA's 4-bit floating point format, which allows for 5x faster inference and up to 30% lower costs for complex agentic tasks compared to other open frontier models. It performs strongly on agent productivity, instruction following, and long-context benchmarks.

NVIDIA Nemotron 3 Ultra is available on Ollama's cloud. You can use it for general chat or integrate it with agentic coding tools like Claude Code, Hermes Agent, and OpenClaw by using the ollama launch command.

Nemotron 3 Ultra supports a 1 million token context window. This allows it to process entire codebases, long tool histories, and extensive research trails without losing context during multi-step agentic workflows.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update