HeadsUpAI

MiniMax Brings 600 Expressive Voices to Together AI for Real-Time Agents

Together AI, a research-optimized inference platform, integrated minimax/speech-2.8-turbo into its voice catalogue, adding over 600 new expressive voices. MiniMax, an AI company building multimodal models, designed this enterprise-grade text-to-speech model for high-fidelity synthesis. It supports 40+ languages and features built-in emotion control for natural-sounding dialogue.
Model
MiniMax Speech 2.8 Turbo
Voice count
600+ voices
Language support
40+ languages
Platform
Together AI
Use cases
Audiobook, meditation, support, and more

This integration follows Together AI's unified voice agent cloud launch, which co-locates speech-to-text and LLMs to minimize latency. By adding MiniMax's latest model, the platform addresses the expressiveness gap in real-time conversational AI. Developers can now combine low-latency response times with human-level prosody, which is critical for autonomous agents.

You can access these voices through Together AI's dedicated infrastructure for production-scale applications. The new Voice Finder tool allows you to filter the 600+ options by use case, such as audiobook narration or customer support. Deployment options are available via the Together AI dashboard for users building real-time voice-first products.

Still wondering? A few quick answers below.

MiniMax Speech 2.8 Turbo is an enterprise-grade text-to-speech model designed for high-fidelity and expressive audio synthesis. It is specifically optimized for real-time voice agents that require low-latency performance. The model supports over 40 languages and allows for fine-grained emotion control to make AI-generated voices sound more natural and human-like during conversations.

The integration brings more than 600 new voices to the Together AI platform. These voices cover a wide range of styles and personas, including audiobook narrators, meditation guides, news broadcasters, and customer support representatives. Users can browse and filter this extensive catalogue using a dedicated Voice Finder tool to select the best match for their specific application.

Yes, the model is built for real-time use cases where speed and expressiveness are critical. By deploying on Together AI's dedicated infrastructure, developers can achieve the low-latency performance necessary for interactive voice agents. This setup ensures that the high-quality audio generation does not introduce significant delays that would disrupt the flow of a natural conversation.

Developers can access the model and its 600+ voices through the Together AI platform. It is available on dedicated infrastructure, which provides the reliable throughput and consistent latency required for production-scale deployments. Users can test different voices and configurations through the Voice Catalogue and Voice Finder tool before integrating them into their own AI-powered voice applications.

Share this update