Fireworks AI Adds NVIDIA Nemotron 3 Ultra for Agentic Reasoning

Fireworks AIFireworks AI

· Updated

Fireworks AI now offers NVIDIA Nemotron 3 Ultra, an open model for advanced autonomous agents, with immediate deployment support. This provides developers with optimized infrastructure for long-running agentic tasks that require frontier reasoning and orchestration.

Fireworks AI has launched day-zero support for NVIDIA Nemotron 3 Ultra, an open model designed for frontier reasoning and orchestration in long-running autonomous agents. This model features a hybrid Transformer-Mamba Mixture of Experts (MoE) architecture with 550 billion total parameters and up to a 1 million token context window.
Active Parameters
55B
Agent Productivity PinchBench
91%
Long-horizon Planning EnterpriseOps-Gym
33%
Coding Terminal-Bench 2.0
54%
Long Context Ruler @1M
95%

The model is optimized for complex, multi-step tasks like coding agents, deep research, and enterprise workflows, where the cost of completing an entire task, not just a single response, is critical. NVIDIA Nemotron 3 was introduced as part of a family of models for agentic AI.

NVIDIA reports Nemotron 3 Ultra achieves 5x faster inference (running a trained AI model to generate outputs) and up to 30% lower cost for agentic tasks compared to other open models in its class. Developers can deploy it on Fireworks AI using on-demand dedicated GPUs, billed by GPU-second.

Nemotron 3 Ultra (550B) performance benchmark comparison across seven key metrics against GLM 5.1, Kimi K2.6, and Qwen3.5.
Fireworks AI
Fireworks AI
@FireworksAI_HQ
X

NVIDIA Nemotron 3 Ultra is on Fireworks, day zero. Nemotron Ultra is an open model for frontier reasoning and orchestration in long-running autonomous agents. Think use cases like coding agents, deep research, and complex enterprise workflows. Read on: https://t.co/c8mdZwQp49 https://t.co/hQ4PJZ6mvM

4retweets50likes
View on X

Still wondering? A few quick answers below.

NVIDIA Nemotron 3 Ultra is an open model developed for advanced reasoning and orchestration within autonomous AI agents. It is designed to handle complex, multi-step tasks without continuous human direction, making it suitable for demanding workflows.

Nemotron 3 Ultra is built for frontier reasoning and orchestrating long-running agentic tasks. It supports a large context window of up to 1 million tokens and is optimized for use cases such as coding agents, deep research, and complex enterprise workflows.

NVIDIA reports Nemotron 3 Ultra delivers 5x faster inference and up to 30% lower cost for agentic tasks compared to other open models in its category. Its hybrid Transformer-Mamba MoE architecture contributes to its efficiency in completing multi-step processes.

Fireworks AI provides day-zero support for Nemotron 3 Ultra. Developers can deploy the model on dedicated GPUs through on-demand deployments, which offer lower latency and predictable performance. Billing is based on GPU-second usage.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update