Cognition Hits 1000 Tokens Per Second in Devin for Terminal

Cognition

May 8, 2026 · Updated May 16, 2026

Cognition released SWE-1.6 Fast for its terminal-based agent, achieving 1,000 tokens per second through a partnership with Cerebras. This speed enables near-instantaneous agentic loops, allowing developers to start tasks locally and hand them off to cloud VMs for persistent execution.

Cognition, an applied AI lab building end-to-end software agents, released SWE-1.6 Fast to extend Devin for Terminal. The update introduces inference (running a trained model to generate outputs from inputs) speeds of 1,000 tokens per second. This performance is powered by Cerebras hardware, designed to eliminate agentic reasoning bottlenecks.

Inference speed: 1000 tokens per second
Hardware partner: Cerebras
CLI language: Rust
Supported models: SWE-1.6 Fast, GPT-5.5, Opus 4.7
Availability: Devin for Terminal

High-speed inference is critical for agentic coding because agents must observe, reason, and act in iterative loops. By crossing the 1,000-token threshold, Cognition moves toward real-time autonomous engineering, matching Windsurf's Cerebras-powered SWE-1.6 Fast Mode and other hardware-accelerated inference stacks that prioritize low-latency execution.

You can install the Rust-based CLI locally to give the agent direct access to your codebase. The tool supports a hybrid workflow: you initiate a task locally and hand it off to a cloud VM. Beyond the native model, the terminal adds to Devin's GPT-5.5 integration by supporting Opus 4.7.

View the full update on devin.ai

Cognition

@cognitionMay 7

Intelligence at 1000 tokens per second, right in your terminal. Now available with SWE-1.6 Fast, powered by @cerebras. We're giving the first 100 people who respond a free month of Max to try it out. https://t.co/ExGS4bu4YB

66689

View on X

Still wondering? A few quick answers below.

Devin for Terminal is a command-line interface that allows Cognition's autonomous AI software engineer to run on a developer's local machine. It provides the agent with direct access to the local codebase, tools, and environment. Developers can use it for interactive work and then hand off complex tasks to a cloud-based virtual machine for persistent execution.

The SWE-1.6 Fast model achieves an inference speed of 1,000 tokens per second. This high-speed performance is powered by Cerebras hardware, which is specialized silicon designed to accelerate AI workloads. This speed is intended to make the agentic loop of reasoning and execution feel near-instantaneous for developers working in the terminal.

You can install Devin for Terminal by running a specific command in your shell that downloads and executes an installation script from the official website. The tool is written in Rust to ensure high performance and is optimized to run on local machines while maintaining a seamless connection to Devin's cloud-based engineering environment.

Yes, Devin for Terminal is a multi-model platform. While SWE-1.6 Fast is the featured high-speed model, the terminal also supports other frontier models including GPT 5.5 and Opus 4.7. This allows developers to choose the specific model that best fits their engineering task while remaining within the same local terminal environment.

The local-to-cloud handoff is a hybrid workflow that allows you to start an engineering task on your local machine using Devin for Terminal. Once the agent has the necessary context, you can delegate the work to a cloud-based virtual machine. This enables the agent to continue testing and fixing code in the background even after you close your laptop.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Cognition →

Keep reading

Cognition Releases SWE-1.6 with Optimized Agent UX and Unmatched Speed

Cognition launched SWE-1.6 in the Windsurf IDE, a model post-trained to eliminate agentic friction like looping and overthinking. It delivers up to 950 tokens per second, prioritizing efficient multi-step trajectories over raw benchmark gains.

Windsurf Partners With Cerebras to Deliver 1000 Tokens Per Second Coding

WindsurfMay 8

Windsurf Partners With Cerebras to Deliver 1000 Tokens Per Second Coding

Windsurf integrated Cerebras inference to power SWE-1.6 Fast Mode, reaching speeds of 1,000 tokens per second for agentic workflows. This performance milestone aims to eliminate the latency bottleneck in multi-step planning and autonomous code generation.

Cognition's SWE-1.6 Preview Beats SWE-1.5 by 11% on Agentic Coding Benchmark

swyxMar 1

Cognition's SWE-1.6 Preview Beats SWE-1.5 by 11% on Agentic Coding Benchmark

Cognition released an early SWE-1.6 preview scoring 51.7% on SWE-Bench Pro — an 11-point jump over SWE-1.5 at the same 950 tok/s speed. It beats top open-source models on the benchmark, with early access rolling out to select users.

NVIDIA Vera Rubin Hits 400 Tokens Per Second for Trillion Parameter Models

NVIDIAMay 6

NVIDIA Vera Rubin Hits 400 Tokens Per Second for Trillion Parameter Models

NVIDIA's Vera Rubin platform uses a co-designed stack of seven specialized chips to solve the high-latency and cost bottlenecks of autonomous AI agents. By integrating dedicated hardware for token generation and tool execution, the system maintains high interactivity for trillion-parameter models while reducing token costs by 90 percent compared to previous architectures.

What is Devin for Terminal?

How fast is the SWE-1.6 Fast model?

How do you install Devin for Terminal?

Does Devin for Terminal support other AI models?

What is the local-to-cloud handoff feature?

Keep reading

Cognition Releases SWE-1.6 with Optimized Agent UX and Unmatched Speed

Cognition Releases SWE-1.6 with Optimized Agent UX and Unmatched Speed

Windsurf Partners With Cerebras to Deliver 1000 Tokens Per Second Coding

Windsurf Partners With Cerebras to Deliver 1000 Tokens Per Second Coding

Cognition's SWE-1.6 Preview Beats SWE-1.5 by 11% on Agentic Coding Benchmark

Cognition's SWE-1.6 Preview Beats SWE-1.5 by 11% on Agentic Coding Benchmark

NVIDIA Vera Rubin Hits 400 Tokens Per Second for Trillion Parameter Models

NVIDIA Vera Rubin Hits 400 Tokens Per Second for Trillion Parameter Models

Keep reading

Cognition Releases SWE-1.6 with Optimized Agent UX and Unmatched Speed

Cognition Releases SWE-1.6 with Optimized Agent UX and Unmatched Speed

Windsurf Partners With Cerebras to Deliver 1000 Tokens Per Second Coding

Windsurf Partners With Cerebras to Deliver 1000 Tokens Per Second Coding

Cognition's SWE-1.6 Preview Beats SWE-1.5 by 11% on Agentic Coding Benchmark

Cognition's SWE-1.6 Preview Beats SWE-1.5 by 11% on Agentic Coding Benchmark

NVIDIA Vera Rubin Hits 400 Tokens Per Second for Trillion Parameter Models

NVIDIA Vera Rubin Hits 400 Tokens Per Second for Trillion Parameter Models