HeadsUpAI

Cognition Hits 1000 Tokens Per Second in Devin for Terminal

· Updated

Cognition, an applied AI lab building end-to-end software agents, released SWE-1.6 Fast to extend Devin for Terminal. The update introduces inference (running a trained model to generate outputs from inputs) speeds of 1,000 tokens per second. This performance is powered by Cerebras hardware, designed to eliminate agentic reasoning bottlenecks.
Inference speed
1000 tokens per second
Hardware partner
Cerebras
CLI language
Rust
Supported models
SWE-1.6 Fast, GPT-5.5, Opus 4.7
Availability
Devin for Terminal

High-speed inference is critical for agentic coding because agents must observe, reason, and act in iterative loops. By crossing the 1,000-token threshold, Cognition moves toward real-time autonomous engineering, matching Windsurf's Cerebras-powered SWE-1.6 Fast Mode and other hardware-accelerated inference stacks that prioritize low-latency execution.

You can install the Rust-based CLI locally to give the agent direct access to your codebase. The tool supports a hybrid workflow: you initiate a task locally and hand it off to a cloud VM. Beyond the native model, the terminal adds to Devin's GPT-5.5 integration by supporting Opus 4.7.

Cognition
Cognition
@cognition
X

Intelligence at 1000 tokens per second, right in your terminal. Now available with SWE-1.6 Fast, powered by @cerebras. We're giving the first 100 people who respond a free month of Max to try it out. https://t.co/ExGS4bu4YB

66retweets689likes
View on X

Still wondering? A few quick answers below.

Devin for Terminal is a command-line interface that allows Cognition's autonomous AI software engineer to run on a developer's local machine. It provides the agent with direct access to the local codebase, tools, and environment. Developers can use it for interactive work and then hand off complex tasks to a cloud-based virtual machine for persistent execution.

The SWE-1.6 Fast model achieves an inference speed of 1,000 tokens per second. This high-speed performance is powered by Cerebras hardware, which is specialized silicon designed to accelerate AI workloads. This speed is intended to make the agentic loop of reasoning and execution feel near-instantaneous for developers working in the terminal.

You can install Devin for Terminal by running a specific command in your shell that downloads and executes an installation script from the official website. The tool is written in Rust to ensure high performance and is optimized to run on local machines while maintaining a seamless connection to Devin's cloud-based engineering environment.

Yes, Devin for Terminal is a multi-model platform. While SWE-1.6 Fast is the featured high-speed model, the terminal also supports other frontier models including GPT 5.5 and Opus 4.7. This allows developers to choose the specific model that best fits their engineering task while remaining within the same local terminal environment.

The local-to-cloud handoff is a hybrid workflow that allows you to start an engineering task on your local machine using Devin for Terminal. Once the agent has the necessary context, you can delegate the work to a cloud-based virtual machine. This enables the agent to continue testing and fixing code in the background even after you close your laptop.

Share this update