OpenAI Speeds Up Agentic Loops With Persistent WebSocket Connections

OpenAI

Apr 30, 2026

OpenAI introduced WebSocket support for its Responses API to eliminate the latency overhead of traditional HTTP requests in multi-step agent workflows. By maintaining a persistent connection and caching conversation state in memory, the system allows coding agents to run up to 40% faster end to end.

OpenAI launched WebSocket support for the Responses API, introducing a persistent connection mode for high-frequency agentic loops (the iterative cycle of AI planning and action). Instead of establishing new HTTP connections for every tool call, the API now keeps a "warm" state that caches tool definitions and conversation history in memory.

End-to-end speedup: Up to 40%
Inference speed: 1,000 tokens per second
Peak burst speed: 4,000 tokens per second
Protocol: WebSockets
Availability: Responses API
Optimized model: GPT-5.3-Codex-Spark

As model inference (the process of generating outputs) speeds have surged past 1,000 tokens per second, the API's structural overhead became the primary bottleneck. This shift follows the OpenAI performance roadmap and extends support for multi-day autonomous Codex workflows that require high-speed, persistent execution for complex engineering tasks.

You can implement WebSocket mode by passing a previous_response_id to continue a conversation without re-sending the full history, matching the pattern seen in Vercel's AI SDK. The feature is available now for developers using the Responses API, specifically optimized for GPT-5.3-Codex-Spark.

View the full update on openai.com

OpenAI Developers

@OpenAIDevsApr 29

⚙️ We made agent loops faster with WebSockets in the Responses API As Codex got faster, the bottleneck moved from inference to inefficient API calls WebSockets keep response state warm across tool calls, helping workflows run up to 40% faster end to end https://t.co/nFeUEdRdKt

38586

View on X

Still wondering? A few quick answers below.

It is a persistent connection transport for the Responses API designed to reduce latency in agentic workflows. By keeping a connection open instead of making repeated HTTP calls, the server can cache conversation state, tool definitions, and rendered tokens in memory, allowing the system to process multi-step tasks significantly faster.

OpenAI reports that agentic workflows run up to 40% faster end to end when using WebSocket mode. This improvement is achieved by eliminating redundant API work, such as re-tokenizing the full conversation history and re-validating requests, which previously added significant overhead as model inference speeds increased toward 1,000 tokens per second.

Developers can use the familiar response.create structure but include a previous_response_id to continue a conversation from a cached state. This allows the server to fetch the previous response object and prior input items from a connection-scoped cache rather than rebuilding the entire context from scratch for every follow-up request in the loop.

The update was specifically designed to support high-speed models like GPT-5.3-Codex-Spark, which runs on specialized hardware at over 1,000 tokens per second. However, other models including GPT-5.4 and subsequent releases also benefit from the reduced API overhead, making multi-file coding tasks and complex tool-use workflows feel much more responsive.

Following a successful alpha period with coding agent startups like Cursor, Vercel, and Cline, WebSocket mode is now available for general use within the Responses API. It is intended for developers building autonomous systems that require frequent back-and-forth communication between the model and local tools or computer environments.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from OpenAI →

Keep reading

OpenAI Responses API Gets Container Pool for 10x Faster Agent Workflows

OpenAI added a container pool to the Responses API, cutting container startup time by around 10x for skills, shell, and code interpreter. Requests now reuse warm infrastructure instead of creating a fresh container each session, reducing latency for agent workflows.

Vercel Launches Fast Mode for Opus 4.7 to Accelerate Agentic Workflows

VercelMay 14

Vercel Launches Fast Mode for Opus 4.7 to Accelerate Agentic Workflows

Vercel added a high-speed inference tier for Anthropic's Claude Opus 4.7 on its AI Gateway, delivering 2.5x faster output token generation. This update allows developers to trade higher costs for reduced latency in complex agentic loops where reasoning speed is the primary bottleneck.

Cloudflare Integrates GPT-5.5 to Power Persistent Autonomous Agents

CloudflareApr 24

Cloudflare Integrates GPT-5.5 to Power Persistent Autonomous Agents

Cloudflare added OpenAI's GPT-5.5 to its AI Gateway, featuring a 1M token context window and 2x cost efficiency over competing frontier coding models. The model is optimized for agentic loops, enabling systems to plan, use tools, and self-verify their work until a task is complete.

What is the OpenAI Responses API WebSocket mode?

How much faster are agentic workflows with WebSockets?

How do developers implement WebSocket mode in the Responses API?

Which models benefit most from the Responses API WebSocket update?

Is the OpenAI WebSocket mode available to everyone?

Keep reading

OpenAI Responses API Gets Container Pool for 10x Faster Agent Workflows

OpenAI Responses API Gets Container Pool for 10x Faster Agent Workflows

Vercel Launches Fast Mode for Opus 4.7 to Accelerate Agentic Workflows

Vercel Launches Fast Mode for Opus 4.7 to Accelerate Agentic Workflows

Cloudflare Integrates GPT-5.5 to Power Persistent Autonomous Agents

Cloudflare Integrates GPT-5.5 to Power Persistent Autonomous Agents

Keep reading

OpenAI Responses API Gets Container Pool for 10x Faster Agent Workflows

OpenAI Responses API Gets Container Pool for 10x Faster Agent Workflows

Vercel Launches Fast Mode for Opus 4.7 to Accelerate Agentic Workflows

Vercel Launches Fast Mode for Opus 4.7 to Accelerate Agentic Workflows

Cloudflare Integrates GPT-5.5 to Power Persistent Autonomous Agents

Cloudflare Integrates GPT-5.5 to Power Persistent Autonomous Agents