NVIDIA Nemotron™ 3 Nano Omni is live on OpenRouter. An open 30B-A3B multimodal model for agentic workflows: text, image, video, and audio in → text out, with a 256k context window and efficient MoE architecture for computer use, documents, and AV reasoning. https://t.co/nLZhy3c3Xv
OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning
· Updated
OpenRouter integrated NVIDIA's Nemotron 3 Nano Omni, a 30B-A3B model that natively processes text, audio, and video in a single inference loop. By using a hybrid Transformer-Mamba architecture, the model reduces the compute cost of video reasoning by 2.5x, making it a high-efficiency perception layer for autonomous agents.
- Model architecture
- Hybrid Transformer-Mamba MoE
- Parameter count
- 30B-A3B (30B total, 3B active)
- Context window
- 256,000 tokens
- Reasoning budget
- 16,384 tokens
- Input modalities
- Text, image, video, audio
- Pricing
- $0 per million tokens
The model uses a hybrid Transformer-Mamba architecture (a design combining attention with memory-efficient sequence modeling) to deliver 2x higher throughput than separate vision and speech pipelines. This design reduces video reasoning compute by 2.5x, matching NVIDIA's Nemotron 3 Nano Omni release and adds to OpenRouter's Tencent Hy3-Preview hosting.
Use the model as a perception sub-agent to extract context from multimodal data. It features a 256k context window and a 16,384-token reasoning budget for internal thinking. The availability mirrors the AWS SageMaker JumpStart integration, with extended thinking enabled via the reasoning parameter in the API.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →





