NVIDIA Launches Nemotron 3 Nano Omni for Efficient Multimodal Sub-Agents

NVIDIA

Apr 28, 2026 · Updated May 6, 2026

NVIDIA released Nemotron 3 Nano Omni, a 30B-parameter multimodal model that unifies text, image, video, and audio understanding into a single architecture. By activating only 3 billion parameters during inference, the model delivers high-efficiency reasoning across a 256K context window for complex agentic workflows.

NVIDIA launched Nemotron 3 Nano Omni, an open multimodal foundation model with 30 billion parameters. It uses a hybrid Mixture-of-Experts (MoE) architecture (a system that activates only specialized sub-networks for each task) to run only 3 billion parameters during inference. This design unifies vision and audio encoders directly into the model backbone.

Total parameters: 30B
Active parameters: 3B
Context window: 256K tokens
Architecture: Hybrid Mixture-of-Experts (MoE)
Supported modalities: Text, Image, Video, Audio
Availability: Open model, Nemotron Labs

This release brings native understanding of video and audio to the open model ecosystem. It mirrors Alibaba's Qwen3.6-35B-A3B by prioritizing inference speed without sacrificing knowledge depth. The 256K context window allows agents to process massive multimodal datasets or long-form audio clips locally.

Use this model to power sub-agents that require fast OCR, speech recognition, or video analysis. It is optimized for high-throughput environments and builds on NVIDIA's Dynamo inference stack. The model is available now for enterprise deployment, matching OpenRouter's integration. NVIDIA is hosting livestreams on May 5 and May 12 to demonstrate implementation.

View the full update on blogs.nvidia.com

NVIDIA AI

@NVIDIAAIApr 28

Meet Nemotron 3 Nano Omni 👋 Our latest addition to the Nemotron family is the highest efficiency, open multimodal model with leading accuracy. 30B parameters. 256K context length. 🧵👇 https://t.co/j4SPpU9SaI

93711

View on X

Still wondering? A few quick answers below.

NVIDIA Nemotron 3 Nano Omni is an open multimodal foundation model designed for high-efficiency reasoning. It unifies text, image, video, and audio understanding into a single architecture. The model is specifically built to power sub-agents that need to perform fast, complex tasks like optical character recognition or speech-to-text within larger autonomous workflows.

The model uses a 30B-A3B hybrid Mixture-of-Experts architecture. While it contains 30 billion total parameters for broad knowledge, it only activates 3 billion parameters during inference. This sparse activation allows the model to maintain the reasoning capabilities of a larger system while operating with the speed and lower compute requirements of a much smaller model.

Nemotron 3 Nano Omni is a natively multimodal model that unifies reasoning across text, images, video, and audio. Unlike systems that stitch together separate models for different tasks, it combines vision and audio encoders within its core architecture. This allows it to see, hear, and read information simultaneously to support enterprise-grade question and answering.

Yes, NVIDIA has released Nemotron 3 Nano Omni as an open model. It is currently available for developers to use through platforms like Nemotron Labs and SGLang. NVIDIA is also hosting technical livestreams in May 2026 to show the developer community how to build specialized sub-agents using the model's efficient 30B-A3B architecture.

Nemotron 3 Nano Omni features a 256K context length, which is exceptionally large for a model with only 3 billion active parameters. This massive window enables the model to process and reason over long documents, extended audio recordings, or high-resolution video clips in a single pass, making it ideal for complex data analysis tasks.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from NVIDIA →

Keep reading

AWS Launches NVIDIA Nemotron 3 Nano Omni for Unified Multimodal Agents

AWS added NVIDIA's Nemotron 3 Nano Omni to SageMaker JumpStart, a 30B parameter model that processes video, audio, and text in a single inference pass. By unifying perception into one architecture, the model eliminates the latency and context fragmentation caused by stitching together separate vision and speech models.

NVIDIAJun 4

NVIDIA Ships Nemotron 3 Ultra for 5x Faster, Cheaper AI Agents

NVIDIA has shipped Nemotron 3 Ultra, a 550B Mixture-of-Experts (MoE) open model designed for long-running AI agents. This model delivers 5x faster inference and up to 30% lower cost for complex agentic tasks compared to other open frontier models, aiming to make autonomous workflows more efficient and accessible.

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

OpenRouterApr 28

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

OpenRouter integrated NVIDIA's Nemotron 3 Nano Omni, a 30B-A3B model that natively processes text, audio, and video in a single inference loop. By using a hybrid Transformer-Mamba architecture, the model reduces the compute cost of video reasoning by 2.5x, making it a high-efficiency perception layer for autonomous agents.

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

OllamaJun 7

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Ollama has made NVIDIA's Nemotron 3 Ultra model available on its cloud. This 550 billion parameter Mixture of Experts (MoE) model is designed for long-running AI agents, delivering 5x faster inference and up to 30% lower costs for complex agentic tasks.

What is NVIDIA Nemotron 3 Nano Omni?

How does the Nemotron 3 Nano Omni architecture work?

What modalities does Nemotron 3 Nano Omni support?

Is Nemotron 3 Nano Omni available to the public?

What is the context window for Nemotron 3 Nano Omni?

Keep reading

AWS Launches NVIDIA Nemotron 3 Nano Omni for Unified Multimodal Agents

AWS Launches NVIDIA Nemotron 3 Nano Omni for Unified Multimodal Agents

NVIDIA Ships Nemotron 3 Ultra for 5x Faster, Cheaper AI Agents

NVIDIA Ships Nemotron 3 Ultra for 5x Faster, Cheaper AI Agents

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Keep reading

AWS Launches NVIDIA Nemotron 3 Nano Omni for Unified Multimodal Agents

AWS Launches NVIDIA Nemotron 3 Nano Omni for Unified Multimodal Agents

NVIDIA Ships Nemotron 3 Ultra for 5x Faster, Cheaper AI Agents

NVIDIA Ships Nemotron 3 Ultra for 5x Faster, Cheaper AI Agents

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents