🪣 We just shipped Storage Buckets: S3-like mutable storage, cheaper & faster Git falls short for everything on high-throughput side of AI (checkpoints, processed data, agent traces, logs etc) Buckets fixes that: fast writes, overwrites, directory sync 💨 All powered by Xet dedup so successive checkpoints skip the bytes that already exist ➡️
Hugging Face Launches Storage Buckets for High-Throughput ML Workflows
· Updated
Hugging Face launched Storage Buckets, a mutable non-versioned object storage layer on the Hub for ML artifacts that change constantly — training checkpoints, optimizer states, processed data shards, and agent traces. Unlike Git-backed repos, Buckets support fast writes, overwrites, and directory sync. Addressable via
hf://buckets/username/bucket-name, they're manageable via the hf CLI or huggingface_hub library (since v1.5.0).Buckets run on Xet, Hugging Face's chunk-based storage backend — successive checkpoints with frozen model sections skip bytes already stored, cutting bandwidth and storage footprint. Enterprise billing is based on deduplicated storage, so shared chunks reduce costs. Hugging Face partners with AWS and GCP for pre-warming, bringing data close to compute before training runs.
Buckets are included in existing Hub storage plans — free accounts get storage to start; PRO and Enterprise offer higher limits. Sync a checkpoint directory with hf buckets sync or access bucket contents from any fsspec-compatible library via hf:// paths.
Hugging Face
@huggingface
36retweets
View on X




