NVIDIA Dynamo 1.0 Ships as Open-Source Inference OS for AI Factories

NVIDIA

Mar 18, 2026 · Updated Apr 25, 2026

NVIDIA Dynamo 1.0 reaches production as open-source software that orchestrates GPU clusters for AI inference at data center scale. It boosts Blackwell GPU inference performance by up to 7x and already runs on AWS, Azure, Google Cloud, and OCI.

NVIDIA Dynamo 1.0 is a distributed inference orchestration layer that coordinates GPU and memory resources across an entire cluster. It routes requests to GPUs that already hold relevant cached context from earlier steps, offloads that context to lower-cost storage when idle, and splits workloads with smarter traffic control — reducing wasted compute and easing memory limits. On Blackwell GPUs, these optimizations deliver up to 7x inference performance gains.

Dynamo integrates natively into popular open-source inference frameworks including vLLM, SGLang, LMCache, and LangChain, with standalone modules like KVBM for memory management and NIXL for GPU-to-GPU data movement. Adoption spans all four major cloud providers, AI-native companies like Cursor and Perplexity, inference providers Baseten, Deep Infra, and Fireworks, and enterprises including ByteDance, PayPal, and Pinterest.

Drop Dynamo's KVBM module into your existing vLLM setup to handle KV cache management independently — a low-risk way to test distributed memory optimization without touching your inference stack.

View the full update on nvidianews.nvidia.com

NVIDIA Newsroom

@nvidianewsroomMar 16

#NVIDIAGTC news: NVIDIA Dynamo 1.0 enters production as the broadly adopted inference operating system for AI factories. Dynamo 1.0 boosts Blackwell inference performance by up to 7x. The industry is scaling on NVIDIA. ⬇️https://t.co/Iaq2H2SmhR

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from NVIDIA →

Keep reading

NVIDIA Launches Dynamo 1.0 as Open Source Inference OS for AI Factories

NVIDIA released Dynamo 1.0, open source software that acts as the distributed "operating system" for AI inference at scale — boosting Blackwell GPU performance by up to 7x. AWS, Google Cloud, ByteDance, and PayPal are already running it in production.

Nebius Sets New Inference Performance Standards for Blackwell and Blackwell Ultra

NebiusApr 1

Nebius Sets New Inference Performance Standards for Blackwell and Blackwell Ultra

Nebius secured 10 first-place finishes in the MLPerf Inference v6.0 benchmarks using the latest NVIDIA Blackwell and Blackwell Ultra systems. These results demonstrate linear performance scaling for frontier models like DeepSeek R1, providing a verified blueprint for high-throughput production AI infrastructure.

Perplexity Benchmarks Blackwell Performance for High Throughput Large Model Inference

PerplexityMay 12

Perplexity Benchmarks Blackwell Performance for High Throughput Large Model Inference

Perplexity published research showing that NVIDIA's GB200 Blackwell architecture nearly halves communication latency for large Mixture-of-Experts models compared to the previous generation. The findings suggest that Blackwell is a primary platform for reducing the cost and latency of serving frontier-scale AI search.

Amazon Web ServicesMar 18

AWS First Major Cloud Provider Supporting NVIDIA RTX PRO 4500 Blackwell GPUs

Amazon EC2 will add instances accelerated by NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs, making AWS the first major cloud provider to announce support. These instances suit workloads like conversational AI, data analytics, content generation, and graphics.