Voice Cloning is now live via the xAI API! Create a custom voice in less than 2 minutes or select from our library of 80+ voices across 28 languages to personalize your voice agents, audiobooks, video game characters, and more. https://t.co/EjxjXssQtd https://t.co/iR8AW2UOgo
xAI Launches Voice Cloning API to Personalize Grok Voice Agents
· Updated
xAI, an AI research company behind the Grok model family, launched Custom Voices to clone human voices from one minute of recorded speech. The system generates a production-ready
voice_id that integrates with the Grok Text to Speech API and the Grok Voice Agent API.- Cloning time
- Under 2 minutes
- Audio required
- 1 minute
- Languages supported
- 28
- Built-in voice catalog
- 80+ voices
- Pricing
- No additional fee
- Safety verification
- Two-stage (Passphrase and Similarity)
This update provides the personalization layer for xAI's audio stack, moving beyond xAI's initial five natural voices released in March. While competitors like Runway's custom voice design target creators, xAI focuses on enterprise reliability by requiring a two-stage safety check—using speech-to-text and speaker embeddings to verify identity.
You can now manage a catalog of custom and built-in voices from the xAI console to power multilingual support or localized content. Custom voices are available immediately via the xAI API at no additional cost beyond standard usage rates, supporting both REST and WebSocket streaming for 28 languages.
xAI
@xai
2.4kretweets21klikes
View on XStill wondering? A few quick answers below.
xAI Custom Voices is a feature within the xAI API and console that allows users to create a digital clone of a human voice. By recording approximately one minute of speech, the system generates a unique voice ID that can be used for text-to-speech narration, video game characters, and real-time conversational agents.
To clone a voice, you record about one minute of natural speech in the xAI console. The system uses a two-stage verification process to confirm the speaker's identity and intent before processing the audio. Once verified, a production-ready voice model is delivered in under two minutes for use across xAI's audio endpoints.
xAI implements a two-stage verification pipeline to prevent unauthorized cloning. First, a passphrase check requires the speaker to read a specific phrase aloud to confirm presence and consent. Second, the system compares speaker embeddings—numerical representations of vocal characteristics—between the passphrase and the full recording to ensure they belong to the same person.
There is no additional fee to create or use custom voices within the xAI ecosystem. Users are only charged the standard rates for the Text to Speech or Voice Agent APIs. This allows developers to integrate personalized vocal identities into their applications without incurring extra costs beyond their existing API consumption.
The xAI Voice Library supports over 80 built-in voices across 28 different languages, including English, Spanish, French, German, Chinese, and Japanese. Custom voices created through the cloning process inherit these multilingual capabilities, allowing a single cloned identity to narrate content or interact with users in any of the supported languages.


