AWS Launches NVIDIA Nemotron 3 Nano Omni for Unified Multimodal Agents

Amazon Web Services

Apr 28, 2026 · Updated May 6, 2026

AWS added NVIDIA's Nemotron 3 Nano Omni to SageMaker JumpStart, a 30B parameter model that processes video, audio, and text in a single inference pass. By unifying perception into one architecture, the model eliminates the latency and context fragmentation caused by stitching together separate vision and speech models.

AWS launched day-zero availability for NVIDIA Nemotron 3 Nano Omni on Amazon SageMaker JumpStart, mirroring NVIDIA's Nemotron 3 Nano Omni release. This multimodal model uses a Mixture of Experts (MoE) architecture (where only a subset of specialized parameters activates per request) to deliver 30B-parameter performance with 3B active parameters.

Total parameters: 30B
Active parameters: 3B (MoE)
Context window: 131K tokens
Input types: Video, Audio, Image, Text
Precision: FP8
License: NVIDIA Open Model Agreement

This release addresses the "perception bottleneck" in agentic workflows. Most systems rely on separate models for transcription and vision, increasing latency. This update builds on AWS agent orchestration workflows by converging these modalities into one reasoning loop, maintaining a unified context across complex tasks like computer use or real-time video analysis.

You can deploy the model immediately through SageMaker JumpStart, following a pattern seen in OpenRouter's integration. It supports a 131K context window and tool calling, making it a viable backbone for enterprise agentic workflows. The model is licensed under the NVIDIA Open Model Agreement for commercial use.

View the full update on aws.amazon.com

AWS AI

@AWSAIApr 28

NVIDIA Nemotron 3 Nano Omni is now available on Amazon SageMaker JumpStart. This multimodal model supports video, audio, image, and text, enabling enterprise Q&A, summarization, transcription, OCR, and document intelligence. With @nvidia Nemotron 3 Nano Omni, organizations can streamline end-to-end processing of meetings, training videos, and documents. https://t.co/XgVkOg6B8x

View on X

Still wondering? A few quick answers below.

NVIDIA Nemotron 3 Nano Omni is a multimodal large language model designed for enterprise agentic workflows. It unifies text, image, video, and audio understanding into a single architecture, allowing applications to process multiple data types in one inference pass. This eliminates the need to stitch together separate models for vision and speech tasks.

The model uses a Mamba2 Transformer Hybrid Mixture of Experts architecture with 30 billion total parameters and 3 billion active parameters. It combines the Nemotron 3 Nano language backbone with the CRADIO v4-H vision encoder and Parakeet speech encoder. This design enables high efficiency and throughput while maintaining a unified context across different modalities.

The model supports text, images in JPEG or PNG formats, audio files such as WAV and MP3 up to one hour long, and MP4 videos up to two minutes or 256 frames. It features a 131K token context window and can generate text outputs including transcriptions with word-level timestamps and structured JSON data.

NVIDIA Nemotron 3 Nano Omni is an open model released under the NVIDIA Open Model Agreement, which permits commercial use. While the weights are open, it is not strictly open source in the traditional sense. It is available for deployment as a managed service through Amazon SageMaker JumpStart or as an NVIDIA NIM microservice.

You can deploy the model through Amazon SageMaker JumpStart using the SageMaker Studio console or the SageMaker Python SDK. Deployment requires an AWS account with sufficient GPU service quotas, such as p4d or p5 instances. SageMaker JumpStart provides optimized inference containers in FP8 precision, removing the need to manage underlying infrastructure.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from AWS →

Keep reading

NVIDIA Launches Nemotron 3 Nano Omni for Efficient Multimodal Sub-Agents

NVIDIA released Nemotron 3 Nano Omni, a 30B-parameter multimodal model that unifies text, image, video, and audio understanding into a single architecture. By activating only 3 billion parameters during inference, the model delivers high-efficiency reasoning across a 256K context window for complex agentic workflows.

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

OpenRouterApr 28

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

OpenRouter integrated NVIDIA's Nemotron 3 Nano Omni, a 30B-A3B model that natively processes text, audio, and video in a single inference loop. By using a hybrid Transformer-Mamba architecture, the model reduces the compute cost of video reasoning by 2.5x, making it a high-efficiency perception layer for autonomous agents.

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

OllamaJun 7

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Ollama has made NVIDIA's Nemotron 3 Ultra model available on its cloud. This 550 billion parameter Mixture of Experts (MoE) model is designed for long-running AI agents, delivering 5x faster inference and up to 30% lower costs for complex agentic tasks.

AWS Launches Agentic SageMaker Workflows to Automate End-to-End Model Customization

Amazon Web ServicesMay 5

AWS Launches Agentic SageMaker Workflows to Automate End-to-End Model Customization

AWS launched agent-guided model customization in Amazon SageMaker AI, enabling users to manage the entire fine-tuning lifecycle through natural language. The system uses specialized agent skills to automate data transformation, hyperparameter tuning, and deployment across popular coding environments. This shift reduces the time required to adapt foundation models from months to just a few days.

What is NVIDIA Nemotron 3 Nano Omni?

How does the Nemotron 3 Nano Omni architecture work?

What input types does Nemotron 3 Nano Omni support?

Is NVIDIA Nemotron 3 Nano Omni open source?

How do you deploy Nemotron 3 Nano Omni on AWS?

Keep reading

NVIDIA Launches Nemotron 3 Nano Omni for Efficient Multimodal Sub-Agents

NVIDIA Launches Nemotron 3 Nano Omni for Efficient Multimodal Sub-Agents

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

AWS Launches Agentic SageMaker Workflows to Automate End-to-End Model Customization

AWS Launches Agentic SageMaker Workflows to Automate End-to-End Model Customization

Keep reading

NVIDIA Launches Nemotron 3 Nano Omni for Efficient Multimodal Sub-Agents

NVIDIA Launches Nemotron 3 Nano Omni for Efficient Multimodal Sub-Agents

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

Ollama Adds NVIDIA Nemotron 3 Ultra for Faster, Cheaper AI Agents

AWS Launches Agentic SageMaker Workflows to Automate End-to-End Model Customization

AWS Launches Agentic SageMaker Workflows to Automate End-to-End Model Customization