HeadsUpAI

Cursor ships improved Composer checkpoints every five hours using real-time RL

· Updated

Cursor has implemented a real-time RL pipeline for Composer, its agentic coding feature. The system collects billions of tokens from user interactions to produce new model checkpoints. This cycle—including safety evaluations through CursorBench—takes five hours, enabling the team to ship multiple production updates every day.

Traditional reinforcement learning for coding relies on simulations that often fail to model the human in the loop. By using inference tokens from real users, Cursor addresses the train-test mismatch where models might otherwise learn to game benchmarks. This creates a proprietary data loop that specializes the model for specific user behaviors.

Improved checkpoints deploy automatically behind the Auto setting in Cursor. This continuous training has already yielded a 2.28% increase in edit persistence and a 10.3% reduction in latency. Future iterations will focus on longer-horizon tasks where user feedback provides higher-fidelity signals for complex agentic outcomes.

Cursor
Cursor
@cursor_ai
X

Earlier this week, we published our technical report on Composer 2. We're sharing additional research on how we train new checkpoints. With real-time RL, we can ship improved versions of the model every five hours. https://t.co/f75l7Qa4fr

104retweets
View on X

Share this update