HeadsUpAI

Fireworks AI Powers Cursor Composer 2 With Distributed Global RL Infrastructure

· Updated

Fireworks AI introduced a disaggregated sampling architecture that exploits weight sparsity in Reinforcement Learning. Between consecutive checkpoints, over 98% of weights in bf16 remain bit-equivalent. Instead of transferring a full 1TB model, the system sends a 20GB compressed delta, reducing cross-region traffic by 98% while maintaining exact reconstruction.

This approach challenges the assumption that frontier RL requires a single, co-located mega-cluster. By making policy updates small, teams can use fragmented GPU capacity across different regions. Cursor used this to train Composer 2 across four global clusters, turning distributed inference into a unified pool for generating training data.

You can implement this via the Fireworks Training SDK, which supports fully managed RL or a "bring your own trainer" model. The platform provides OpenAI-compatible sampling endpoints and a weight update API. These tools bound policy staleness to a few minutes and keep in-memory GPU swaps under 60 seconds.

Fireworks AI
Fireworks AI
@FireworksAI_HQ
X

We’re seeing lots of interest in how Cursor delivered Composer 2. One less obvious insight: you don't need to spend billions on a giant cluster to do reinforcement learning. With disaggregated sampling, we ran @Cursor_ai Composer 2 training across 3-4 clusters worldwide, with a unified capacity of Fireworks Virtual Cloud. Check how we optimize cross-region 1TB+ model updates by 98%+ while keeping staleness under a few minutes: https://t.co/0Ziv6ssFNx

27retweets329likes
View on X

Share this update