OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

OpenRouter

Apr 28, 2026 · Updated May 6, 2026

OpenRouter integrated NVIDIA's Nemotron 3 Nano Omni, a 30B-A3B model that natively processes text, audio, and video in a single inference loop. By using a hybrid Transformer-Mamba architecture, the model reduces the compute cost of video reasoning by 2.5x, making it a high-efficiency perception layer for autonomous agents.

OpenRouter, a unified API for accessing hundreds of language models, launched free access to NVIDIA's Nemotron 3 Nano Omni. This 30B-A3B Mixture-of-Experts (MoE) model (an architecture activating only a fraction of its parameters) accepts text, image, video, and audio inputs to produce text outputs.

Model architecture: Hybrid Transformer-Mamba MoE
Parameter count: 30B-A3B (30B total, 3B active)
Context window: 256,000 tokens
Reasoning budget: 16,384 tokens
Input modalities: Text, image, video, audio
Pricing: $0 per million tokens

The model uses a hybrid Transformer-Mamba architecture (a design combining attention with memory-efficient sequence modeling) to deliver 2x higher throughput than separate vision and speech pipelines. This design reduces video reasoning compute by 2.5x, matching NVIDIA's Nemotron 3 Nano Omni release and adds to OpenRouter's Tencent Hy3-Preview hosting.

Use the model as a perception sub-agent to extract context from multimodal data. It features a 256k context window and a 16,384-token reasoning budget for internal thinking. The availability mirrors the AWS SageMaker JumpStart integration, with extended thinking enabled via the reasoning parameter in the API.

View the full update on openrouter.ai

OpenRouter

@OpenRouterApr 28

NVIDIA Nemotron™ 3 Nano Omni is live on OpenRouter. An open 30B-A3B multimodal model for agentic workflows: text, image, video, and audio in → text out, with a 256k context window and efficient MoE architecture for computer use, documents, and AV reasoning. https://t.co/nLZhy3c3Xv

17107

View on X

Still wondering? A few quick answers below.

NVIDIA Nemotron 3 Nano Omni is a 30B-A3B open multimodal model designed as a perception sub-agent for enterprise systems. It natively processes text, image, video, and audio inputs to produce text outputs. The model uses a Mixture-of-Experts architecture, meaning it has 30 billion total parameters but only activates 3 billion during each inference loop.

The model is built on a hybrid Transformer-Mamba architecture, which combines standard attention with memory-efficient sequence modeling. It incorporates Conv3D video layers and Efficient Video Sampling to handle complex visual data. This design allows the model to achieve approximately 2x higher throughput and 2.5x lower compute costs for video reasoning tasks compared to using separate vision and speech pipelines.

Yes, the model is currently available for free on OpenRouter. Users can access the model via the API at zero cost for both input and output tokens. However, users should note that data sent to the free endpoint may be used by NVIDIA to improve their products and services according to their trial terms of service.

The model features a 256,000-token context window, allowing it to process large documents or long video sequences. Additionally, it supports an internal reasoning budget of 16,384 tokens. This enables extended thinking where the model performs step-by-step reasoning before providing a final answer, which can be enabled using the reasoning parameter on the OpenRouter platform.

No, Nemotron 3 Nano Omni is an any-to-text model. While it can perceive and reason across multiple input modalities—including text, images, video, and audio—it only produces text as an output. This makes it suitable for tasks like document intelligence, video reasoning, and acting as a context-aware sub-agent within larger enterprise agent systems.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from OpenRouter →

Keep reading

NVIDIA Launches Nemotron 3 Nano Omni for Efficient Multimodal Sub-Agents

NVIDIA released Nemotron 3 Nano Omni, a 30B-parameter multimodal model that unifies text, image, video, and audio understanding into a single architecture. By activating only 3 billion parameters during inference, the model delivers high-efficiency reasoning across a 256K context window for complex agentic workflows.

OpenRouter Hosts Perceptron Mk1 for Structured Video and Embodied Reasoning

OpenRouterMay 12

OpenRouter Hosts Perceptron Mk1 for Structured Video and Embodied Reasoning

OpenRouter integrated Perceptron Mk1, a vision-language model designed for spatial grounding and video understanding with structured outputs like bounding boxes and timestamps. The model introduces a reasoning toggle for complex visual tasks at a significantly lower price point than general-purpose frontier models.

AWS Launches NVIDIA Nemotron 3 Nano Omni for Unified Multimodal Agents

Amazon Web ServicesApr 28

AWS Launches NVIDIA Nemotron 3 Nano Omni for Unified Multimodal Agents

AWS added NVIDIA's Nemotron 3 Nano Omni to SageMaker JumpStart, a 30B parameter model that processes video, audio, and text in a single inference pass. By unifying perception into one architecture, the model eliminates the latency and context fragmentation caused by stitching together separate vision and speech models.

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

OllamaJun 7

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Ollama has made NVIDIA's Nemotron 3 Ultra model available on its cloud. This 550 billion parameter Mixture of Experts (MoE) model is designed for long-running AI agents, delivering 5x faster inference and up to 30% lower costs for complex agentic tasks.

What is NVIDIA Nemotron 3 Nano Omni?

How does the Nemotron 3 Nano Omni architecture work?

Is NVIDIA Nemotron 3 Nano Omni free to use?

What is the context window for Nemotron 3 Nano Omni?

Can Nemotron 3 Nano Omni generate images or audio?

Keep reading

NVIDIA Launches Nemotron 3 Nano Omni for Efficient Multimodal Sub-Agents

NVIDIA Launches Nemotron 3 Nano Omni for Efficient Multimodal Sub-Agents

OpenRouter Hosts Perceptron Mk1 for Structured Video and Embodied Reasoning

OpenRouter Hosts Perceptron Mk1 for Structured Video and Embodied Reasoning

AWS Launches NVIDIA Nemotron 3 Nano Omni for Unified Multimodal Agents

AWS Launches NVIDIA Nemotron 3 Nano Omni for Unified Multimodal Agents

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Keep reading

NVIDIA Launches Nemotron 3 Nano Omni for Efficient Multimodal Sub-Agents

NVIDIA Launches Nemotron 3 Nano Omni for Efficient Multimodal Sub-Agents

OpenRouter Hosts Perceptron Mk1 for Structured Video and Embodied Reasoning

OpenRouter Hosts Perceptron Mk1 for Structured Video and Embodied Reasoning

AWS Launches NVIDIA Nemotron 3 Nano Omni for Unified Multimodal Agents

AWS Launches NVIDIA Nemotron 3 Nano Omni for Unified Multimodal Agents

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents