An experimental voice pipeline for the Agents SDK enables real-time voice interactions over WebSockets. Developers can now build agents with continuous STT and TTS in just ~30 lines of server-side code. https://t.co/fsXTBZzs3x
Cloudflare Launches Experimental Voice Pipeline for Real-Time Agent Interactions
· Updated
Cloudflare, a connectivity cloud and developer platform, released
@cloudflare/voice, an experimental package for its Agents SDK that adds real-time speech-to-text and text-to-speech capabilities. The pipeline uses WebSockets to stream audio directly to a Durable Object (a stateful, addressable server instance), allowing developers to implement voice loops in roughly 30 lines of code.Traditional voice AI architectures are fragmented, requiring separate services for transport, transcription, and reasoning, which introduces latency. This update moves the entire pipeline to the edge, reducing network hops and ensuring that voice and text inputs share the same SQLite-backed conversation history and tool access.
You can start building by adding the @cloudflare/voice package to an existing project. The system includes built-in Workers AI providers for Deepgram Flux and Deepgram Aura, plus a Twilio adapter for phone lines. Interruption handling is supported via an abort signal that stops synthesis when a user speaks.
Cloudflare
@Cloudflare
27retweets155likes
View on X




