Cold starts for large models are one of the hardest problems in AI inference infrastructure. Today we're launching the Baseten Delivery Network (BDN) to solve one of the hardest parts of this problem. 2–3x faster cold starts for large models at scale via optimizations at the pod, node, and cluster levels.
Baseten Delivery Network Cuts Large Model Cold Starts by 2–3x
· Updated
Baseten, an AI inference platform, launched the Baseten Delivery Network (BDN) to cut cold starts 2–3x for large models. Three mechanisms work together: weights mirror to Baseten-managed storage at push time — removing HuggingFace, S3, and GCS runtime dependencies — a three-tier cache (local NVMe → peer nodes → mirrored origin) serves weight data to replicas, and single-flight downloads ensure only one node fetches any given file from origin.
The thundering herd problem is what BDN targets: 50–100 replicas simultaneously pulling the same hundreds of gigabytes from origin at scale-up saturates bandwidth for all. Single-flight assigns one responsible fetcher per file; the rest load from the local cache tier instead of racing to origin.
Deploy large models on Baseten Cloud and let BDN handle weight delivery — the gains are largest in burst scenarios where dozens of replicas need the same model weights at once.
Baseten
@baseten
1retweets
View on X

