xAI Launches Standalone Grok Speech to Text and Text to Speech APIs

xAIxAI

· Updated

xAI released standalone Speech to Text and Text to Speech APIs that power Grok Voice and Tesla vehicles. The new endpoints offer high-accuracy transcription across 25 languages and expressive voice generation with granular emotional controls.

xAI, an AI research company behind the Grok model family, Grok AI assistant, and the xAI API, launched standalone Grok Speech to Text (STT) and Grok Text to Speech (TTS) APIs. These endpoints unbundle the audio stack used in Tesla and Starlink. The STT engine handles multi-speaker transcription, while the TTS engine uses speech tags like [laugh] for expressive delivery.

This move positions xAI as a direct competitor to specialized providers. By focusing on Inverse Text Normalization (formatting spoken numbers into structured text), the models target business domains like legal and medical transcription. Internal benchmarks claim high accuracy in identifying entities during phone calls.

Integrate these features via REST or WebSocket APIs for real-time streaming. Grok STT is $0.10 per hour for batch processing; Grok TTS is $4.20 per million characters. Both are available now through the xAI API console, including a playground for testing expressive speech tags.

xAI
xAI
@xai
X

Grok's Speech to Text API is now available. Instant, multi-speaker transcription across 25 languages - at the best price in the market. https://t.co/eGbB2bDtZf

126retweets937likes
View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update