HeadsUpAI

Alibaba Fun-Realtime-TTS claims top spot on Speech Arena leaderboard

Artificial Analysis has ranked Alibaba's Fun-Realtime-TTS as the top model on its Speech Arena leaderboard. The model achieved an Elo score (a relative skill rating system) of 1,219, marking the first time an Alibaba system has led the rankings. This ranking reflects the lab's recent success with the Qwen series.
Elo Score
1,219
Arena Rank
#1
Pricing
$27.59 per 1M characters
Features
Voice cloning, voice design, multilingual support

The text-to-speech landscape is tightening, with only 24 Elo points separating the top five models. By surpassing Gemini 3.1 Flash TTS, Alibaba has demonstrated that Chinese labs are closing the quality gap. This shift follows the arrival of other high-performance alternatives like Voxtral TTS in the speech market.

Developers can access Fun-Realtime-TTS via Alibaba Cloud for $27.59 per million characters. The system supports real-time speech generation with integrated voice cloning. Its support for multilingual outputs and regional dialects makes it a candidate for global voice agents where low-latency interaction is a priority.

Artificial Analysis
Artificial Analysis
@ArtificialAnlys
X

Alibaba's Fun-Realtime-TTS takes the #1 spot on the Artificial Analysis Speech Arena Leaderboard, surpassing Google's Gemini 3.1 Flash TTS and Inworld's Realtime TTS-2 Research Preview Competition at the top of the TTS Arena is tighter than ever, with just 24 Elo points separating the top five models. Fun-Realtime-TTS takes the top spot with the highest Elo score on the leaderboard. @Ali_TongyiLab @AlibabaGroup's previous Fun-Realtime-TTS-Preview reached #7 on the leaderboard, making this Alibaba's first #1 model in the Artificial Analysis Speech Arena. Fun-Realtime-TTS is available via Alibaba Cloud with API access for developers. Key takeaways: ➤ Quality: Fun-Realtime-TTS has an Elo score of 1,219 (+16/-16) based on 962 arena appearances, placing it ahead of Gemini 3.1 Flash TTS at 1,214, Inworld Realtime TTS-2 Research Preview at 1,209, and Cartesia Sonic 3.5 at 1,203 ➤ Pricing: Fun-Realtime-TTS is priced at $27.59/1M characters, positioning it between Gemini 3.1 Flash TTS at $18.3/1M characters and Inworld Realtime TTS 1.5 Max at $35/1M characters, while remaining below Sonic 3.5 at $39/1M characters. ➤ Features: Fun-Realtime-TTS supports real-time speech generation with voice cloning, voice design, multilingual output, and support for regional accents and dialects. See more details and listen to samples below 🧵

12retweets126likes
View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update