HeadsUpAI

AWS Launches NVIDIA Nemotron 3 Nano Omni for Unified Multimodal Agents

· Updated

AWS launched day-zero availability for NVIDIA Nemotron 3 Nano Omni on Amazon SageMaker JumpStart, mirroring NVIDIA's Nemotron 3 Nano Omni release. This multimodal model uses a Mixture of Experts (MoE) architecture (where only a subset of specialized parameters activates per request) to deliver 30B-parameter performance with 3B active parameters.
Total parameters
30B
Active parameters
3B (MoE)
Context window
131K tokens
Input types
Video, Audio, Image, Text
Precision
FP8
License
NVIDIA Open Model Agreement

This release addresses the "perception bottleneck" in agentic workflows. Most systems rely on separate models for transcription and vision, increasing latency. This update builds on AWS agent orchestration workflows by converging these modalities into one reasoning loop, maintaining a unified context across complex tasks like computer use or real-time video analysis.

You can deploy the model immediately through SageMaker JumpStart, following a pattern seen in OpenRouter's integration. It supports a 131K context window and tool calling, making it a viable backbone for enterprise agentic workflows. The model is licensed under the NVIDIA Open Model Agreement for commercial use.

AWS AI
AWS AI
@AWSAI
X

NVIDIA Nemotron 3 Nano Omni is now available on Amazon SageMaker JumpStart. This multimodal model supports video, audio, image, and text, enabling enterprise Q&A, summarization, transcription, OCR, and document intelligence. With @nvidia Nemotron 3 Nano Omni, organizations can streamline end-to-end processing of meetings, training videos, and documents. https://t.co/XgVkOg6B8x

1retweets2likes
View on X

Still wondering? A few quick answers below.

NVIDIA Nemotron 3 Nano Omni is a multimodal large language model designed for enterprise agentic workflows. It unifies text, image, video, and audio understanding into a single architecture, allowing applications to process multiple data types in one inference pass. This eliminates the need to stitch together separate models for vision and speech tasks.

The model uses a Mamba2 Transformer Hybrid Mixture of Experts architecture with 30 billion total parameters and 3 billion active parameters. It combines the Nemotron 3 Nano language backbone with the CRADIO v4-H vision encoder and Parakeet speech encoder. This design enables high efficiency and throughput while maintaining a unified context across different modalities.

The model supports text, images in JPEG or PNG formats, audio files such as WAV and MP3 up to one hour long, and MP4 videos up to two minutes or 256 frames. It features a 131K token context window and can generate text outputs including transcriptions with word-level timestamps and structured JSON data.

NVIDIA Nemotron 3 Nano Omni is an open model released under the NVIDIA Open Model Agreement, which permits commercial use. While the weights are open, it is not strictly open source in the traditional sense. It is available for deployment as a managed service through Amazon SageMaker JumpStart or as an NVIDIA NIM microservice.

You can deploy the model through Amazon SageMaker JumpStart using the SageMaker Studio console or the SageMaker Python SDK. Deployment requires an AWS account with sufficient GPU service quotas, such as p4d or p5 instances. SageMaker JumpStart provides optimized inference containers in FP8 precision, removing the need to manage underlying infrastructure.

Share this update