HeadsUpAI

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

· Updated

OpenRouter, a unified API for accessing hundreds of language models, launched free access to NVIDIA's Nemotron 3 Nano Omni. This 30B-A3B Mixture-of-Experts (MoE) model (an architecture activating only a fraction of its parameters) accepts text, image, video, and audio inputs to produce text outputs.
Model architecture
Hybrid Transformer-Mamba MoE
Parameter count
30B-A3B (30B total, 3B active)
Context window
256,000 tokens
Reasoning budget
16,384 tokens
Input modalities
Text, image, video, audio
Pricing
$0 per million tokens

The model uses a hybrid Transformer-Mamba architecture (a design combining attention with memory-efficient sequence modeling) to deliver 2x higher throughput than separate vision and speech pipelines. This design reduces video reasoning compute by 2.5x, matching NVIDIA's Nemotron 3 Nano Omni release and adds to OpenRouter's Tencent Hy3-Preview hosting.

Use the model as a perception sub-agent to extract context from multimodal data. It features a 256k context window and a 16,384-token reasoning budget for internal thinking. The availability mirrors the AWS SageMaker JumpStart integration, with extended thinking enabled via the reasoning parameter in the API.

OpenRouter
OpenRouter
@OpenRouter
X

NVIDIA Nemotron™ 3 Nano Omni is live on OpenRouter. An open 30B-A3B multimodal model for agentic workflows: text, image, video, and audio in → text out, with a 256k context window and efficient MoE architecture for computer use, documents, and AV reasoning. https://t.co/nLZhy3c3Xv

17retweets107likes
View on X

Still wondering? A few quick answers below.

NVIDIA Nemotron 3 Nano Omni is a 30B-A3B open multimodal model designed as a perception sub-agent for enterprise systems. It natively processes text, image, video, and audio inputs to produce text outputs. The model uses a Mixture-of-Experts architecture, meaning it has 30 billion total parameters but only activates 3 billion during each inference loop.

The model is built on a hybrid Transformer-Mamba architecture, which combines standard attention with memory-efficient sequence modeling. It incorporates Conv3D video layers and Efficient Video Sampling to handle complex visual data. This design allows the model to achieve approximately 2x higher throughput and 2.5x lower compute costs for video reasoning tasks compared to using separate vision and speech pipelines.

Yes, the model is currently available for free on OpenRouter. Users can access the model via the API at zero cost for both input and output tokens. However, users should note that data sent to the free endpoint may be used by NVIDIA to improve their products and services according to their trial terms of service.

The model features a 256,000-token context window, allowing it to process large documents or long video sequences. Additionally, it supports an internal reasoning budget of 16,384 tokens. This enables extended thinking where the model performs step-by-step reasoning before providing a final answer, which can be enabled using the reasoning parameter on the OpenRouter platform.

No, Nemotron 3 Nano Omni is an any-to-text model. While it can perceive and reason across multiple input modalities—including text, images, video, and audio—it only produces text as an output. This makes it suitable for tasks like document intelligence, video reasoning, and acting as a context-aware sub-agent within larger enterprise agent systems.

Share this update