⚙️ We made agent loops faster with WebSockets in the Responses API As Codex got faster, the bottleneck moved from inference to inefficient API calls WebSockets keep response state warm across tool calls, helping workflows run up to 40% faster end to end https://t.co/nFeUEdRdKt
OpenAI Speeds Up Agentic Loops With Persistent WebSocket Connections
OpenAIOpenAI introduced WebSocket support for its Responses API to eliminate the latency overhead of traditional HTTP requests in multi-step agent workflows. By maintaining a persistent connection and caching conversation state in memory, the system allows coding agents to run up to 40% faster end to end.
- End-to-end speedup
- Up to 40%
- Inference speed
- 1,000 tokens per second
- Peak burst speed
- 4,000 tokens per second
- Protocol
- WebSockets
- Availability
- Responses API
- Optimized model
- GPT-5.3-Codex-Spark
As model inference (the process of generating outputs) speeds have surged past 1,000 tokens per second, the API's structural overhead became the primary bottleneck. This shift follows the OpenAI performance roadmap and extends support for multi-day autonomous Codex workflows that require high-speed, persistent execution for complex engineering tasks.
You can implement WebSocket mode by passing a previous_response_id to continue a conversation without re-sending the full history, matching the pattern seen in Vercel's AI SDK. The feature is available now for developers using the Responses API, specifically optimized for GPT-5.3-Codex-Spark.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →




