HeadsUpAI

xAI Launches Standalone Grok Speech to Text and Text to Speech APIs

· Updated

xAI, an AI research company behind the Grok model family, Grok AI assistant, and the xAI API, launched standalone Grok Speech to Text (STT) and Grok Text to Speech (TTS) APIs. These endpoints unbundle the audio stack used in Tesla and Starlink. The STT engine handles multi-speaker transcription, while the TTS engine uses speech tags like [laugh] for expressive delivery.

This move positions xAI as a direct competitor to specialized providers. By focusing on Inverse Text Normalization (formatting spoken numbers into structured text), the models target business domains like legal and medical transcription. Internal benchmarks claim high accuracy in identifying entities during phone calls.

Integrate these features via REST or WebSocket APIs for real-time streaming. Grok STT is $0.10 per hour for batch processing; Grok TTS is $4.20 per million characters. Both are available now through the xAI API console, including a playground for testing expressive speech tags.

xAI
xAI
@xai
X

Grok's Speech to Text API is now available. Instant, multi-speaker transcription across 25 languages - at the best price in the market. https://t.co/eGbB2bDtZf

126retweets937likes
View on X

Share this update