Windsurf Partners With Cerebras to Deliver 1000 Tokens Per Second Coding

Windsurf

May 8, 2026

Windsurf integrated Cerebras inference to power SWE-1.6 Fast Mode, reaching speeds of 1,000 tokens per second for agentic workflows. This performance milestone aims to eliminate the latency bottleneck in multi-step planning and autonomous code generation.

Windsurf, an AI-native editor from Cognition, partnered with Cerebras to serve its SWE-1.6 model at 1,000 tokens per second. This "Fast Mode" uses specialized inference hardware to accelerate the model, following a pattern seen in Cognition's SWE-1.6 Fast release for terminal-based agents.

Model: SWE-1.6
Inference speed: 1,000 tokens per second
Hardware: Cerebras Inference
Credits per giveaway plan: $250
Availability: Windsurf Editor

The update follows Windsurf 2.0 and its background delegation, though real-time interactive coding still suffers from frontier model latency. Hitting 1,000 tok/s operates alongside Windsurf's parallel agentic coding to reduce the friction of rapid iteration.

To promote the new tier, Windsurf is giving away five plans with $250 in credits each to users who engage with the announcement. This Cerebras-powered mode is a superior alternative for planning and development tasks that demand state-of-the-art capabilities without the typical wait times of high-reasoning models.

View the full update on cognition.ai

Windsurf

@windsurfMay 7

We've teamed up with @cerebras to offer free Windsurf plans for SWE-1.6 Fast Mode at up to 1000 tok/s! Fast Mode is built on Cerebras inference, enabling superior speed for planning and development with state of the art capabilities.

10168

View on X

Still wondering? A few quick answers below.

Windsurf SWE-1.6 Fast Mode is a high-speed version of Cognition's agentic coding model integrated into the Windsurf editor. It is designed for rapid software engineering tasks, including planning and multi-file development. The mode uses specialized inference hardware to minimize the latency typically found in complex AI agent workflows that require multiple reasoning steps.

The SWE-1.6 Fast Mode runs at speeds of up to 1,000 tokens per second. This performance is achieved through a partnership with Cerebras, using their specialized inference infrastructure. This speed is significantly faster than standard cloud-hosted models, allowing the AI agent to generate code and iterate on complex engineering plans almost instantaneously.

Windsurf is currently offering a promotion where five users can win free plans with 250 dollars in credit to access the Cerebras-powered Fast Mode. To qualify, users must reply to the official announcement from Cerebras. While a free version of SWE-1.6 exists at lower speeds, the 1,000 tokens per second tier is a premium offering.

According to the announcement, SWE-1.6 Fast Mode powered by Cerebras offers a clear speed advantage in side-by-side comparisons with models like Claude. This increased throughput allows for more iterations, faster bug fixes, and better overall code quality by reducing the time the AI agent spends in the planning and execution phases of development.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Cognition →

Keep reading

Cognition Hits 1000 Tokens Per Second in Devin for Terminal

Cognition released SWE-1.6 Fast for its terminal-based agent, achieving 1,000 tokens per second through a partnership with Cerebras. This speed enables near-instantaneous agentic loops, allowing developers to start tasks locally and hand them off to cloud VMs for persistent execution.

Windsurf Adds Claude Opus 4.7 Fast Mode to Accelerate Agentic Coding

WindsurfMay 13

Windsurf Adds Claude Opus 4.7 Fast Mode to Accelerate Agentic Coding

Windsurf added support for Claude Opus 4.7 fast mode, advertised at roughly 2.5x higher output speeds while preserving the model's full intelligence. The integration is live inside the Windsurf IDE built by agent lab Cognition.

Cognition's SWE-1.6 Preview Beats SWE-1.5 by 11% on Agentic Coding Benchmark

swyxMar 1

Cognition's SWE-1.6 Preview Beats SWE-1.5 by 11% on Agentic Coding Benchmark

Cognition released an early SWE-1.6 preview scoring 51.7% on SWE-Bench Pro — an 11-point jump over SWE-1.5 at the same 950 tok/s speed. It beats top open-source models on the benchmark, with early access rolling out to select users.

LightSeek Foundation Launches TokenSpeed to Optimize Blackwell for Agentic AI

LightSeek FoundationMay 7

LightSeek Foundation Launches TokenSpeed to Optimize Blackwell for Agentic AI

LightSeek Foundation released TokenSpeed, an open-source inference engine designed specifically for the long-context and high-throughput demands of AI coding agents. By optimizing kernels for NVIDIA Blackwell hardware, the system achieves higher performance than TensorRT-LLM on agentic benchmarks while maintaining the usability of vLLM.

What is Windsurf SWE-1.6 Fast Mode?

How fast is the Cerebras-powered SWE-1.6 model?

How can users get access to Windsurf SWE-1.6 Fast Mode?

How does SWE-1.6 Fast Mode compare to Claude?

Keep reading

Cognition Hits 1000 Tokens Per Second in Devin for Terminal

Cognition Hits 1000 Tokens Per Second in Devin for Terminal

Windsurf Adds Claude Opus 4.7 Fast Mode to Accelerate Agentic Coding

Windsurf Adds Claude Opus 4.7 Fast Mode to Accelerate Agentic Coding

Cognition's SWE-1.6 Preview Beats SWE-1.5 by 11% on Agentic Coding Benchmark

Cognition's SWE-1.6 Preview Beats SWE-1.5 by 11% on Agentic Coding Benchmark

LightSeek Foundation Launches TokenSpeed to Optimize Blackwell for Agentic AI

LightSeek Foundation Launches TokenSpeed to Optimize Blackwell for Agentic AI

Keep reading

Cognition Hits 1000 Tokens Per Second in Devin for Terminal

Cognition Hits 1000 Tokens Per Second in Devin for Terminal

Windsurf Adds Claude Opus 4.7 Fast Mode to Accelerate Agentic Coding

Windsurf Adds Claude Opus 4.7 Fast Mode to Accelerate Agentic Coding

Cognition's SWE-1.6 Preview Beats SWE-1.5 by 11% on Agentic Coding Benchmark

Cognition's SWE-1.6 Preview Beats SWE-1.5 by 11% on Agentic Coding Benchmark

LightSeek Foundation Launches TokenSpeed to Optimize Blackwell for Agentic AI

LightSeek Foundation Launches TokenSpeed to Optimize Blackwell for Agentic AI