There's a better way to serve your inference stack, you just haven't found it yet. DynoSim is a workload-driven simulation of the Dynamo serving stack that turns exhaustive deployment search into a simulate-then-verify loop. Instead of testing every deployment choice, teams can model the whole stack on one virtual timeline, screen thousands of configurations in high fidelity simulation, then validate only the best candidates on real hardware. And because it's a full Rust implementation, it runs extremely fast. In our testing, 1,500x faster than real time.
NVIDIA DynoSim Simulates LLM Inference Stacks 1,500x Faster Than Real-Time
- Simulation speed
- 1,500x faster than real-time
- Modeled components
- Router, Planner, KVBM, and Schedulers
- Supported engines
- vLLM and SGLang
- Simulation throughput
- 23,608 requests in 2.41 seconds
- Metrics tracked
- TTFT, TPOT, TPS, and cache reuse
Tuning modern LLM deployments is a massive search problem where local improvements often shift bottlenecks. DynoSim replaces exhaustive hardware testing with a simulate-then-verify loop, mapping the Pareto frontier (the optimal balance of cost and performance). It provides accurate predictions for metrics like Time to First Token by modeling specific engine behaviors.
You can use the simulator to optimize autoscaling intervals or quantify how Dynamo Snapshot's cold-start reductions impact traffic bursts. The framework also supports an autoresearch loop where AI agents propose and score algorithmic changes to routers. Technical guides for the mocker replay and planner components are now available.
Still wondering? A few quick answers below.




