An experimental voice pipeline for the Agents SDK enables real-time voice interactions over WebSockets. Developers can now build agents with continuous STT and TTS in just ~30 lines of server-side code. https://t.co/fsXTBZzs3x
Cloudflare Launches Experimental Voice Pipeline for Real-Time Agent Interactions
Cloudflare· Updated
Cloudflare introduced the @cloudflare/voice package, an experimental extension for its Agents SDK that enables bidirectional voice communication over WebSockets. By unifying voice and text state within a single Durable Object, developers can build multimodal agents that maintain context across different interaction channels.
@cloudflare/voice, an experimental package for its Agents SDK that adds real-time speech-to-text and text-to-speech capabilities. The pipeline uses WebSockets to stream audio directly to a Durable Object (a stateful, addressable server instance), allowing developers to implement voice loops in roughly 30 lines of code.Traditional voice AI architectures are fragmented, requiring separate services for transport, transcription, and reasoning, which introduces latency. This update moves the entire pipeline to the edge, reducing network hops and ensuring that voice and text inputs share the same SQLite-backed conversation history and tool access.
You can start building by adding the @cloudflare/voice package to an existing project. The system includes built-in Workers AI providers for Deepgram Flux and Deepgram Aura, plus a Twilio adapter for phone lines. Interruption handling is supported via an abort signal that stops synthesis when a user speaks.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →





