HeadsUpAI

NVIDIA Launches Dynamo 1.0 as Open Source Inference OS for AI Factories

· Updated

NVIDIA launched Dynamo 1.0, open source software for generative and agentic inference at scale. Like an operating system coordinating hardware, Dynamo orchestrates GPU and memory resources across entire clusters — routing requests, moving data between GPUs and lower-cost storage, and handling unpredictable agentic AI workloads. In benchmarks, Dynamo boosted inference performance of NVIDIA Blackwell GPUs by up to 7x, lowering cost per token.

The ecosystem adoption is widespread. Cloud providers — AWS, Microsoft Azure, Google Cloud, and Oracle Cloud — have integrated Dynamo into their infrastructure, alongside AI-native companies Cursor and Perplexity, inference providers Fireworks and Baseten, and global enterprises ByteDance, PayPal, and Pinterest. Dynamo integrates natively with SGLang, vLLM, and LangChain through TensorRT-LLM optimizations.

If you're running inference at scale — whether on cloud, endpoint providers, or your own infrastructure — Dynamo is worth evaluating as the orchestration layer between your models and your GPUs.

NVIDIA Newsroom
NVIDIA Newsroom
@nvidianewsroom
X

AI factories have a new inference engine, NVIDIA Dynamo. Dynamo 1.0 is a production-grade, open source "operating system" that boosts inference performance up to 7x—lowering token cost and increasing revenue opportunity. Learn how the AI ecosystem is deploying Dynamo 🧵

8retweets
View on X

Share this update