HeadsUpAI

NVIDIA Nemotron 3 Super Tops Open Source Leaderboard for Enterprise Agents

· Updated

NVIDIA announced that its Nemotron 3 Super model has topped the open-source category on the EnterpriseOps-Gym leaderboard. This benchmark is an agentic gauntlet (a test of autonomous decision-making) evaluating 1,150 tasks. The model must navigate 512 functional tools to complete workflows across multiple enterprise systems.
Tasks evaluated
1,150
Functional tools
512
Total parameters
120 billion
Active parameters
12 billion
Availability
Open weights

The shift to autonomous agents requires models that handle long-horizon reasoning and tool-use. By outperforming other open-weight models, Nemotron 3 Super proves it can manage the "plumbing" of enterprise work. It extends the multimodal capabilities of Nemotron 3 Nano Omni and adds to the OpenShell security runtime.

Use Nemotron 3 Super as a high-capacity engine for production-grade enterprise automation, following the release of NVIDIA's supply chain agentic workflow. The model is a 120-billion-parameter Mixture-of-Experts designed for high-throughput inference. It is available as an open-weight model for local deployment or cloud-based agentic sessions.

NVIDIA AI
NVIDIA AI
@NVIDIAAI
X

Benchmarks should reflect real-world performance. That’s why we’re excited to share that Nemotron 3 Super has topped the open source category on the EnterpriseOps-Gym leaderboard. This agentic gauntlet evaluates performance across 1,150 tasks in fully interactive environments with 512 functional tools, requiring agents to coordinate across multiple enterprise systems and tools to complete a single workflow. 📊 https://t.co/wt54NRNgeK

28retweets234likes
View on X

Still wondering? A few quick answers below.

Nemotron 3 Super is a 120-billion-parameter open-weight model designed for complex enterprise AI tasks. It uses a Mixture-of-Experts architecture, meaning it only activates 12 billion parameters during each forward pass to maintain high efficiency. The model is specifically optimized for agentic workflows that require reasoning across multiple systems and tools.

EnterpriseOps-Gym is an agentic evaluation framework that tests how well AI models function as autonomous agents in real-world environments. It consists of 1,150 interactive tasks and provides access to 512 functional tools. To succeed, an agent must coordinate across various enterprise systems and tools to complete a single, unified workflow.

Nemotron 3 Super is released as an open-weight model, making its trained parameters available for developers to download and run. NVIDIA also provides the data and recipes used for the model. This allows organizations to deploy the model on their own infrastructure or through supported API providers like Together AI and OpenRouter.

Nemotron 3 Super currently holds the top position in the open-source category on the EnterpriseOps-Gym leaderboard. According to NVIDIA, the model delivers significantly higher throughput than competing models like Qwen or GPT-OSS. Its hybrid architecture allows it to maintain high accuracy on reasoning benchmarks while processing tokens at a much faster rate.

The model features 120 billion total parameters but operates with 12 billion active parameters per token using a Mixture-of-Experts design. It was pre-trained on 25 trillion tokens and underwent post-training using supervised fine-tuning and reinforcement learning. This combination enables the model to handle complex tool-calling and multi-step reasoning tasks efficiently.

Share this update