Cloudflare Launches Experimental Voice Pipeline for Real-Time Agent Interactions

Cloudflare

Apr 16, 2026 · Updated Apr 25, 2026

Cloudflare introduced the @cloudflare/voice package, an experimental extension for its Agents SDK that enables bidirectional voice communication over WebSockets. By unifying voice and text state within a single Durable Object, developers can build multimodal agents that maintain context across different interaction channels.

Cloudflare, a connectivity cloud and developer platform, released @cloudflare/voice, an experimental package for its Agents SDK that adds real-time speech-to-text and text-to-speech capabilities. The pipeline uses WebSockets to stream audio directly to a Durable Object (a stateful, addressable server instance), allowing developers to implement voice loops in roughly 30 lines of code.

Traditional voice AI architectures are fragmented, requiring separate services for transport, transcription, and reasoning, which introduces latency. This update moves the entire pipeline to the edge, reducing network hops and ensuring that voice and text inputs share the same SQLite-backed conversation history and tool access.

You can start building by adding the @cloudflare/voice package to an existing project. The system includes built-in Workers AI providers for Deepgram Flux and Deepgram Aura, plus a Twilio adapter for phone lines. Interruption handling is supported via an abort signal that stops synthesis when a user speaks.

View the full update on blog.cloudflare.com

Cloudflare

@CloudflareApr 15

An experimental voice pipeline for the Agents SDK enables real-time voice interactions over WebSockets. Developers can now build agents with continuous STT and TTS in just ~30 lines of server-side code. https://t.co/fsXTBZzs3x

27155

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Cloudflare →

Keep reading

Cloudflare Launches Agent Memory to Give AI Agents Persistent Long Term State

Cloudflare introduced Agent Memory, a managed service that extracts and stores key information from agent conversations to prevent context rot. By moving state management to a dedicated pipeline, agents can recall past decisions and facts across sessions without exhausting their context windows.

Cloudflare adds MiniMax M3 with 1M context for agentic coding

MiniMaxJun 2

Cloudflare adds MiniMax M3 with 1M context for agentic coding

Cloudflare has integrated the MiniMax M3 foundation model into its AI Gateway platform. The update provides developers with a high-context, multimodal model specialized for autonomous coding tasks directly within their existing infrastructure.

Cloudflare Proposes .well-known URI Standard for Agent Skills Discovery

Matt Silverlock 🐀Jan 21

Cloudflare Proposes .well-known URI Standard for Agent Skills Discovery

Cloudflare published a draft RFC proposing /.well-known/skills/ as a standard endpoint for coding agents to discover agent skills on any domain. Without a standard for skill discovery, agents rely on searching repos or registries - if adopted, they could discover skills on any site automatically.

ElevenLabsMay 20

ElevenLabs Launches Speech Engine for Plug and Play Voice Agent Upgrades

ElevenLabs released Speech Engine, a unified pipeline that combines transcription, speech synthesis, and conversational orchestration into a single API. The tool allows developers to add a low-latency voice layer to existing text-based agents without rearchitecting their underlying model or retrieval systems.