Cognition's SWE-1.6 Preview Beats SWE-1.5 by 11% on Agentic Coding Benchmark

CognitionCognition

· Updated

Cognition released an early SWE-1.6 preview scoring 51.7% on SWE-Bench Pro — an 11-point jump over SWE-1.5 at the same 950 tok/s speed. It beats top open-source models on the benchmark, with early access rolling out to select users.

Cognition, the team behind Devin, shared an early checkpoint of their ongoing SWE-1.6 training run. Trained with 100x more RL compute than SWE-1.5 on a 6x faster stack, it scores 51.7% on SWE-Bench Pro — an 11-point jump over SWE-1.5's 40.1% — at the same 950 tok/s speed.

The training surfaced a finding Cognition calls the "Model UX" gap: higher benchmark scores don't always mean better real-world behavior. The model iterates harder on tough problems but also overthinks and over-verifies in ways that hurt interactivity — an active area of research.

Early access is rolling out to a small group to help tune this behavior. Follow Cognition's platform updates if you want to test SWE-1.6 before the full release.

Cognition
Cognition
@cognition
X

We are sharing an early preview of our ongoing SWE-1.6 training run. It significantly improves upon SWE-1.5 while being post-trained on the same pre-trained model - and it runs equally as fast at 950 tok/s. On SWE-Bench Pro it exceeds top open-source models. The preview model https://t.co/poTXyKcKnj

113retweets
View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update