Fireworks AI Adds NVIDIA Nemotron 3 Ultra for Agentic Reasoning

Fireworks AI

Jun 4, 2026 · Updated Jun 21, 2026

Fireworks AI now offers NVIDIA Nemotron 3 Ultra, an open model for advanced autonomous agents, with immediate deployment support. This provides developers with optimized infrastructure for long-running agentic tasks that require frontier reasoning and orchestration.

Fireworks AI has launched day-zero support for NVIDIA Nemotron 3 Ultra, an open model designed for frontier reasoning and orchestration in long-running autonomous agents. This model features a hybrid Transformer-Mamba Mixture of Experts (MoE) architecture with 550 billion total parameters and up to a 1 million token context window.

Active Parameters: 55B
Agent Productivity PinchBench: 91%
Long-horizon Planning EnterpriseOps-Gym: 33%
Coding Terminal-Bench 2.0: 54%
Long Context Ruler @1M: 95%

The model is optimized for complex, multi-step tasks like coding agents, deep research, and enterprise workflows, where the cost of completing an entire task, not just a single response, is critical. NVIDIA Nemotron 3 was introduced as part of a family of models for agentic AI.

NVIDIA reports Nemotron 3 Ultra achieves 5x faster inference (running a trained AI model to generate outputs) and up to 30% lower cost for agentic tasks compared to other open models in its class. Developers can deploy it on Fireworks AI using on-demand dedicated GPUs, billed by GPU-second.

View the full update on fireworks.ai

Fireworks AI

@FireworksAI_HQJun 4

NVIDIA Nemotron 3 Ultra is on Fireworks, day zero. Nemotron Ultra is an open model for frontier reasoning and orchestration in long-running autonomous agents. Think use cases like coding agents, deep research, and complex enterprise workflows. Read on: https://t.co/c8mdZwQp49 https://t.co/hQ4PJZ6mvM

450

View on X

Still wondering? A few quick answers below.

NVIDIA Nemotron 3 Ultra is an open model developed for advanced reasoning and orchestration within autonomous AI agents. It is designed to handle complex, multi-step tasks without continuous human direction, making it suitable for demanding workflows.

Nemotron 3 Ultra is built for frontier reasoning and orchestrating long-running agentic tasks. It supports a large context window of up to 1 million tokens and is optimized for use cases such as coding agents, deep research, and complex enterprise workflows.

NVIDIA reports Nemotron 3 Ultra delivers 5x faster inference and up to 30% lower cost for agentic tasks compared to other open models in its category. Its hybrid Transformer-Mamba MoE architecture contributes to its efficiency in completing multi-step processes.

Fireworks AI provides day-zero support for Nemotron 3 Ultra. Developers can deploy the model on dedicated GPUs through on-demand deployments, which offer lower latency and predictable performance. Billing is based on GPU-second usage.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Fireworks AI →

Keep reading

NVIDIA Nemotron 3 Ultra Powers Faster, Cheaper Reasoning for AI Agents

NVIDIA has released Nemotron 3 Ultra, an open model designed for long-running AI agents, and provided a tutorial for its setup and demonstrations. This model aims to make complex, multi-step agentic workflows faster and more cost-effective by delivering high throughput and efficient reasoning.

Arena.ai Adds Nemotron 3 Ultra to Agent Mode for Real-World Agent Evaluation

ArenaJun 5

Arena.ai Adds Nemotron 3 Ultra to Agent Mode for Real-World Agent Evaluation

Arena.ai has integrated NVIDIA's Nemotron 3 Ultra model into its Agent Mode, enabling users to run the model for complex, multi-step tasks. These sessions contribute to the new Agent Arena leaderboard, which evaluates agentic AI models on real-world performance using tools like web search and terminal. This expands the range of frontier models available for practical agentic workflows and provides new data for understanding their capabilities in autonomous tasks.

LangChain Adds NVIDIA Nemotron 3 Ultra for Faster AI Agents

LangChainJun 7

LangChain Adds NVIDIA Nemotron 3 Ultra for Faster AI Agents

LangChain announced immediate support for NVIDIA Nemotron 3 Ultra, an open frontier model designed for long-running AI agents. This integration makes the model's 5x faster inference and up to 30% lower cost for complex agentic tasks directly available to developers using the LangChain framework.

PerplexityJun 5

Perplexity Adds NVIDIA Nemotron 3 Ultra for Faster Agentic AI

Perplexity has made NVIDIA's Nemotron 3 Ultra model available to its Pro and Max subscribers on its platform and Perplexity Computer. This open model is designed for long-running AI agents, offering faster task completion and reduced costs for complex, multi-step workflows.

What is NVIDIA Nemotron 3 Ultra?

What are the key capabilities of Nemotron 3 Ultra?

How does Nemotron 3 Ultra perform on agentic tasks?

How can I access Nemotron 3 Ultra on Fireworks AI?

Keep reading

NVIDIA Nemotron 3 Ultra Powers Faster, Cheaper Reasoning for AI Agents

NVIDIA Nemotron 3 Ultra Powers Faster, Cheaper Reasoning for AI Agents

Arena.ai Adds Nemotron 3 Ultra to Agent Mode for Real-World Agent Evaluation

Arena.ai Adds Nemotron 3 Ultra to Agent Mode for Real-World Agent Evaluation

LangChain Adds NVIDIA Nemotron 3 Ultra for Faster AI Agents

LangChain Adds NVIDIA Nemotron 3 Ultra for Faster AI Agents

Perplexity Adds NVIDIA Nemotron 3 Ultra for Faster Agentic AI

Perplexity Adds NVIDIA Nemotron 3 Ultra for Faster Agentic AI

Keep reading

NVIDIA Nemotron 3 Ultra Powers Faster, Cheaper Reasoning for AI Agents

NVIDIA Nemotron 3 Ultra Powers Faster, Cheaper Reasoning for AI Agents

Arena.ai Adds Nemotron 3 Ultra to Agent Mode for Real-World Agent Evaluation

Arena.ai Adds Nemotron 3 Ultra to Agent Mode for Real-World Agent Evaluation

LangChain Adds NVIDIA Nemotron 3 Ultra for Faster AI Agents

LangChain Adds NVIDIA Nemotron 3 Ultra for Faster AI Agents

Perplexity Adds NVIDIA Nemotron 3 Ultra for Faster Agentic AI

Perplexity Adds NVIDIA Nemotron 3 Ultra for Faster Agentic AI