HeadsUpAI

PyTorch Extends ExecuTorch to On-Device Voice Agent Inference

ยท Updated

ExecuTorch, PyTorch's native on-device inference platform, extends to voice workloads with reference implementations for five models: Voxtral Realtime for streaming transcription, Parakeet TDT for offline transcription, Sortformer for speaker diarization, Whisper, and Silero VAD. Models export directly via torch.export() with minimal changes โ€” no C++ rewrites or format conversions. A thin C++ layer handles orchestration while ExecuTorch runs inference across XNNPACK (CPU), Metal Performance Shaders (Apple GPU), CUDA, and Qualcomm NPU.

Open-source voice models are proliferating, but native deployment remains fragmented โ€” most solutions require model-specific C++ rewrites or lock developers into one hardware ecosystem. ExecuTorch's write-once approach lets a single exported model run across Linux, macOS, Windows, Android, and iOS. LM Studio, a desktop app for running LLMs locally, already ships ExecuTorch-powered transcription in production.

Export your voice models and deploy across platforms from a single artifact โ€” sample apps for desktop and Android are ready to build on.

PyTorch
PyTorch
@PyTorch
X

#ExecuTorch addresses fragmented native deployment for #AI agents as a #PyTorch native platform. It enables voice models across CPU, GPU, and NPU on Android, iOS, Linux, macOS & Windows ๐Ÿ”— https://t.co/NeQQyUniL4 https://t.co/O3itnoQFoG

1retweets
View on X

Share this update