OpenAI Responses API Gets Container Pool for 10x Faster Agent Workflows

OpenAI

Mar 21, 2026 · Updated Apr 25, 2026

OpenAI added a container pool to the Responses API, cutting container startup time by around 10x for skills, shell, and code interpreter. Requests now reuse warm infrastructure instead of creating a fresh container each session, reducing latency for agent workflows.

OpenAI's Responses API now includes a container pool so requests reuse warm infrastructure for agent tool execution. Previously, each request using skills, shell, or code interpreter required a full container creation each session. With the pool in place, containers spin up about 10x faster.

This matters for developers building multi-step agent workflows. Container startup overhead added latency at every step — shell commands, code interpreter sessions, and skill-based tasks all waited for fresh containers. Reusing warm containers keeps the agent loop tighter and more responsive.

Build agent workflows against the Responses API with shell, skills, or code interpreter — the container pool is live now, and requests reuse warm infrastructure by default.

View the full update on developers.openai.com

OpenAI Developers

@OpenAIDevsMar 21

Agent workflows got even faster. You can spin up containers for skills, shell and code interpreter about 10x faster. We added a container pool to the Responses API, so requests can reuse warm infrastructure instead of creating a full container creation each session. https://t.co/lmvwsaf5HN

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from OpenAI →

Keep reading

OpenAI Equips Responses API with a Full Computer Environment

OpenAI's Responses API now runs agents inside hosted containers with file systems, network access, and a shell tool. The agent loop handles long-running workflows by proposing shell commands, streaming results back, and compacting context automatically when the window fills.

OpenRouter Launches Response Caching to Deliver Free and Instant Identical Requests

OpenRouterMay 2

OpenRouter Launches Response Caching to Deliver Free and Instant Identical Requests

OpenRouter introduced a beta response caching feature that stores the output of identical API requests at the edge. By skipping the model provider for repeated calls, developers can eliminate token costs and reduce latency from seconds to milliseconds.

OpenClaw Cuts Agent Latency and Package Size Through Core Modularization

OpenClawMay 29

OpenClaw Cuts Agent Latency and Package Size Through Core Modularization

OpenClaw released a performance report showing that its self-hosted AI agents now run nearly 3x faster after a major architectural sweep. By moving heavy provider dependencies into optional plugins, the platform reduced its package size by 59% and its dependency count by 42%.

WarpApr 25

Warp Integrates GPT-5.5 to Boost Agent Speed and Efficiency

Warp added OpenAI's GPT-5.5 to its terminal-based AI agent, delivering a 40% speed increase over the previous model version. The update focuses on improving the responsiveness and cost-effectiveness of autonomous development loops like site migrations and code verification.