The newest open model to join the Agent Arena leaderboard, Nemotron 3 Ultra by @NVIDIA lands at #20 overall and #5 among open models. Its standout signals are a positive praise-vs-complaint margin and low tool hallucination, but it's held back by steerability and bash recovery. Note the wide confidence intervals as scores are still stabilizing. In Agent Arena, models get web search, filesystem, and terminal tools to complete complex workflows: writing code, creating slide deck, researching the web, building apps, and analyzing documents. We use the causal tracing methodology to measure a model's net improvement which indicates how much it improves outcomes relative to the average model. See in thread how Nemotron 3 Ultra scored across 5 signals, drawn from tasks submitted by a global community of users.
Arena.ai Ranks NVIDIA Nemotron 3 Ultra #20 on Agent Arena Leaderboard
ArenaArena.ai added NVIDIA's Nemotron 3 Ultra to the Agent Arena leaderboard, where it ranks #20 overall and #5 among open models. The model shows strong tool-use discipline, tying for #1 in tool hallucination, but struggles with steerability and bash recovery. These scores, based on 2,849 sessions, remain subject to wide confidence intervals as data stabilizes.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →





