Arena.ai Ranks NVIDIA Nemotron 3 Ultra #20 on Agent Arena Leaderboard

ArenaArena

Arena.ai added NVIDIA's Nemotron 3 Ultra to the Agent Arena leaderboard, where it ranks #20 overall and #5 among open models. The model shows strong tool-use discipline, tying for #1 in tool hallucination, but struggles with steerability and bash recovery. These scores, based on 2,849 sessions, remain subject to wide confidence intervals as data stabilizes.

Agent Arena leaderboard ranking Nemotron 3 Ultra at position 20 with a negative net improvement versus baseline.
Arena.ai
Arena.ai
@arena
X

The newest open model to join the Agent Arena leaderboard, Nemotron 3 Ultra by @NVIDIA lands at #20 overall and #5 among open models. Its standout signals are a positive praise-vs-complaint margin and low tool hallucination, but it's held back by steerability and bash recovery. Note the wide confidence intervals as scores are still stabilizing. In Agent Arena, models get web search, filesystem, and terminal tools to complete complex workflows: writing code, creating slide deck, researching the web, building apps, and analyzing documents. We use the causal tracing methodology to measure a model's net improvement which indicates how much it improves outcomes relative to the average model. See in thread how Nemotron 3 Ultra scored across 5 signals, drawn from tasks submitted by a global community of users.

6retweets165likes
View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update