NVIDIA Launches Dynamo 1.0 as Open Source Inference OS for AI Factories

NVIDIA

Mar 20, 2026 · Updated Apr 28, 2026

NVIDIA released Dynamo 1.0, open source software that acts as the distributed "operating system" for AI inference at scale — boosting Blackwell GPU performance by up to 7x. AWS, Google Cloud, ByteDance, and PayPal are already running it in production.

NVIDIA launched Dynamo 1.0, open source software for generative and agentic inference at scale. Like an operating system coordinating hardware, Dynamo orchestrates GPU and memory resources across entire clusters — routing requests, moving data between GPUs and lower-cost storage, and handling unpredictable agentic AI workloads. In benchmarks, Dynamo boosted inference performance of NVIDIA Blackwell GPUs by up to 7x, lowering cost per token.

The ecosystem adoption is widespread. Cloud providers — AWS, Microsoft Azure, Google Cloud, and Oracle Cloud — have integrated Dynamo into their infrastructure, alongside AI-native companies Cursor and Perplexity, inference providers Fireworks and Baseten, and global enterprises ByteDance, PayPal, and Pinterest. Dynamo integrates natively with SGLang, vLLM, and LangChain through TensorRT-LLM optimizations.

If you're running inference at scale — whether on cloud, endpoint providers, or your own infrastructure — Dynamo is worth evaluating as the orchestration layer between your models and your GPUs.

View the full update on nvidianews.nvidia.com

NVIDIA Newsroom

@nvidianewsroomMar 17

AI factories have a new inference engine, NVIDIA Dynamo. Dynamo 1.0 is a production-grade, open source "operating system" that boosts inference performance up to 7x—lowering token cost and increasing revenue opportunity. Learn how the AI ecosystem is deploying Dynamo 🧵

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from NVIDIA →

Keep reading

NVIDIA Dynamo 1.0 Ships as Open-Source Inference OS for AI Factories

NVIDIA Dynamo 1.0 reaches production as open-source software that orchestrates GPU clusters for AI inference at data center scale. It boosts Blackwell GPU inference performance by up to 7x and already runs on AWS, Azure, Google Cloud, and OCI.

Nebius Sets New Inference Performance Standards for Blackwell and Blackwell Ultra

NebiusApr 1

Nebius Sets New Inference Performance Standards for Blackwell and Blackwell Ultra

Nebius secured 10 first-place finishes in the MLPerf Inference v6.0 benchmarks using the latest NVIDIA Blackwell and Blackwell Ultra systems. These results demonstrate linear performance scaling for frontier models like DeepSeek R1, providing a verified blueprint for high-throughput production AI infrastructure.

Perplexity Launches ROSE Inference Engine to Optimize Blackwell GPU Performance

PerplexityMay 7

Perplexity Launches ROSE Inference Engine to Optimize Blackwell GPU Performance

Perplexity developed a custom inference engine called ROSE and a domain-specific language to build specialized GPU kernels for NVIDIA hardware. By moving down the stack, the company can achieve peak performance on Blackwell chips and reduce latency for massive trillion-parameter models.

Amazon Web ServicesMar 18

AWS First Major Cloud Provider Supporting NVIDIA RTX PRO 4500 Blackwell GPUs

Amazon EC2 will add instances accelerated by NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs, making AWS the first major cloud provider to announce support. These instances suit workloads like conversational AI, data analytics, content generation, and graphics.