NVIDIA Nemotron 3 Ultra Claims Top US Open Weights Intelligence Spot

Artificial Analysis

Jun 1, 2026 · Updated Jun 10, 2026

NVIDIA released Nemotron 3 Ultra, a 550B-parameter model that leads US open-weights benchmarks with an intelligence score of 48. The model delivers high-throughput performance exceeding 300 tokens per second, significantly outpacing similarly sized frontier models from China.

Artificial Analysis evaluated the new NVIDIA Nemotron 3 Ultra following its Computex debut. It uses a sparse architecture with 550 billion total parameters but only 55 billion active parameters—neurons triggered for a task—to balance reasoning depth with high efficiency.

Total Parameters: 550B
Active Parameters: 55B
Intelligence Index Score: 48
Inference Speed: 300+ tokens per second
Sparsity: 90%

This release shifts the open-weights landscape, unseating Gemma 4 31B as the previous US intelligence leader. While it trails the Chinese frontier, it maintains a massive speed advantage. It serves over 300 tokens per second, outpacing international peers like Kimi K2.6 by 3x.

The model extends the family introduced in the Nemotron 3 launch and follows the Nemotron 3 Super launch. It accompanies the NVIDIA Cosmos 3 release and is available in BF16 weights, with NVFP4 quantization—a compression method to speed up inference—coming soon. Artificial Analysis benchmarked the model on a pre-release DeepInfra endpoint.

View the full update on artificialanalysis.ai

Artificial Analysis

@ArtificialAnlysJun 1

NVIDIA just announced the release of Nemotron 3 Ultra in Jensen Huang's Computex keynote: at 550B parameters (55B active), this is the largest Nemotron 3 model to date, and it is the most intelligent US open weights model We partnered with @nvidia to evaluate this model for intelligence and speed - these figures use the model’s BF16 weights, but as with Nemotron 3 Super the model will be made available in NVFP4 quantization as well for higher inference performance. ➤ New leader for US open weights intelligence: Nemotron 3 Ultra scores 48 on the Artificial Analysis Intelligence Index. This is well ahead of the next strongest US open weights models, Gemma 4 31B (39), Nemotron 3 Super (36) and gpt-oss-120b (33), but behind the Chinese-led open weights frontier (Kimi K2.6 at 54). ➤ Leading speed for its intelligence: on a pre-release @DeepInfra endpoint, Nemotron 3 Ultra served over 300 tokens per second. Peer models in its size class from China-based labs such as DeepSeek and Moonshot (Kimi) are generally served at speeds of 50-100 tokens per second in the market today. gpt-oss-120b is served at speeds similar to this level, but with significantly lower intelligence. ➤ Largest Nemotron 3 model so far: at approximately 550 billion total parameters and 90% sparsity, Nemotron 3 Ultra is significantly larger than its siblings and is the largest recent US open weights model release We’ll be sharing additional analysis and full benchmarks at release.

126936

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Artificial Analysis →

Keep reading

NVIDIA Ships Nemotron 3 Ultra for 5x Faster, Cheaper AI Agents

NVIDIA has shipped Nemotron 3 Ultra, a 550B Mixture-of-Experts (MoE) open model designed for long-running AI agents. This model delivers 5x faster inference and up to 30% lower cost for complex agentic tasks compared to other open frontier models, aiming to make autonomous workflows more efficient and accessible.

Artificial Analysis Ranks Nemotron 3 Ultra Fastest for Agentic Tasks

Artificial AnalysisJun 4

Artificial Analysis Ranks Nemotron 3 Ultra Fastest for Agentic Tasks

Artificial Analysis evaluated NVIDIA's newly launched Nemotron 3 Ultra, finding it completes agentic tasks significantly faster than peers due to high inference speed. The model achieves competitive performance on Terminal-Bench v2.1, positioning it as a leading option for efficient autonomous AI workflows.

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

OllamaJun 7

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Ollama has made NVIDIA's Nemotron 3 Ultra model available on its cloud. This 550 billion parameter Mixture of Experts (MoE) model is designed for long-running AI agents, delivering 5x faster inference and up to 30% lower costs for complex agentic tasks.

Fireworks AI Adds NVIDIA Nemotron 3 Ultra for Agentic Reasoning

Fireworks AIJun 4

Fireworks AI Adds NVIDIA Nemotron 3 Ultra for Agentic Reasoning

Fireworks AI now offers NVIDIA Nemotron 3 Ultra, an open model for advanced autonomous agents, with immediate deployment support. This provides developers with optimized infrastructure for long-running agentic tasks that require frontier reasoning and orchestration.