Cold starts for large models are one of the hardest problems in AI inference infrastructure. Today we're launching the Baseten Delivery Network (BDN) to solve one of the hardest parts of this problem. 2–3x faster cold starts for large models at scale via optimizations at the pod, node, and cluster levels.
Baseten Delivery Network Cuts Large Model Cold Starts by 2–3x
· Updated
Baseten launched the Baseten Delivery Network (BDN), cutting cold starts 2–3x for large models. BDN eliminates weight download bottlenecks with three-tier caching and single-flight downloads to prevent thundering herd issues during burst scaling. Available to all Baseten customers today.
The thundering herd problem is what BDN targets: 50–100 replicas simultaneously pulling the same hundreds of gigabytes from origin at scale-up saturates bandwidth for all. Single-flight assigns one responsible fetcher per file; the rest load from the local cache tier instead of racing to origin.
Deploy large models on Baseten Cloud and let BDN handle weight delivery — the gains are largest in burst scenarios where dozens of replicas need the same model weights at once.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →




