MiniMax Brings 600 Expressive Voices to Together AI for Real-Time Agents

MiniMax

May 20, 2026 · Updated Jun 13, 2026

MiniMax integrated its Speech 2.8 Turbo model into Together AI, adding over 600 expressive voices to the platform's catalogue. This expansion provides developers with high-fidelity, low-latency audio synthesis specifically optimized for building autonomous voice agents on dedicated infrastructure.

Together AI, a research-optimized inference platform, integrated minimax/speech-2.8-turbo into its voice catalogue, adding over 600 new expressive voices. MiniMax, an AI company building multimodal models, designed this enterprise-grade text-to-speech model for high-fidelity synthesis. It supports 40+ languages and features built-in emotion control for natural-sounding dialogue.

Model: MiniMax Speech 2.8 Turbo
Voice count: 600+ voices
Language support: 40+ languages
Platform: Together AI
Use cases: Audiobook, meditation, support, and more

This integration follows Together AI's unified voice agent cloud launch, which co-locates speech-to-text and LLMs to minimize latency. By adding MiniMax's latest model, the platform addresses the expressiveness gap in real-time conversational AI. Developers can now combine low-latency response times with human-level prosody, which is critical for autonomous agents.

You can access these voices through Together AI's dedicated infrastructure for production-scale applications. The new Voice Finder tool allows you to filter the 600+ options by use case, such as audiobook narration or customer support. Deployment options are available via the Together AI dashboard for users building real-time voice-first products.

View the full update on voicefinder.together.ai

MiniMax (official)

@MiniMax_AIMay 20

600+ new voices powered by MiniMax Speech 2.8 Turbo are now on Together AI @togethercompute 🎙️✨ Try it today: https://t.co/faZb6Q1lui

5147

View on X

Still wondering? A few quick answers below.

MiniMax Speech 2.8 Turbo is an enterprise-grade text-to-speech model designed for high-fidelity and expressive audio synthesis. It is specifically optimized for real-time voice agents that require low-latency performance. The model supports over 40 languages and allows for fine-grained emotion control to make AI-generated voices sound more natural and human-like during conversations.

The integration brings more than 600 new voices to the Together AI platform. These voices cover a wide range of styles and personas, including audiobook narrators, meditation guides, news broadcasters, and customer support representatives. Users can browse and filter this extensive catalogue using a dedicated Voice Finder tool to select the best match for their specific application.

Yes, the model is built for real-time use cases where speed and expressiveness are critical. By deploying on Together AI's dedicated infrastructure, developers can achieve the low-latency performance necessary for interactive voice agents. This setup ensures that the high-quality audio generation does not introduce significant delays that would disrupt the flow of a natural conversation.

Developers can access the model and its 600+ voices through the Together AI platform. It is available on dedicated infrastructure, which provides the reliable throughput and consistent latency required for production-scale deployments. Users can test different voices and configurations through the Voice Catalogue and Voice Finder tool before integrating them into their own AI-powered voice applications.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from MiniMax →

Keep reading

Together AI powers MiniMax M3 with 1M context and sparse attention

Together AI is now powering inference for MiniMax M3, a multimodal model featuring a 1-million-token context window. The model uses a new sparse attention architecture to process massive datasets with significantly lower computational overhead than previous-generation models.

MiniMaxApr 15

MiniMax Launches Open Source Music Skills for Agents to Compose and Sing

MiniMax open-sourced three Music Skills that allow AI agents to generate full tracks, sing in character, and curate local music libraries. By moving music generation from a standalone tool to a native agent capability, developers can now build multimodal agents that use audio as a functional output.

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Fireworks AIJun 4

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Fireworks AI is now powering inference for MiniMax M3, a multimodal model featuring a novel sparse attention architecture. The partnership enables 15.6x faster decoding at 1-million-token context, making real-time agentic workflows viable at scale.

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows

VercelJun 2

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows

Vercel has integrated the MiniMax M3 foundation model into its AI Gateway, enabling developers to access 1-million-token context and native multimodality through the AI SDK. The model currently leads open-source rankings on Next.js benchmarks, particularly when paired with agentic instructions.

What is MiniMax Speech 2.8 Turbo?

How many voices are available in the MiniMax Speech 2.8 Turbo integration?

Is MiniMax Speech 2.8 Turbo available for real-time applications?

How can developers access MiniMax Speech 2.8 Turbo on Together AI?

Keep reading

Together AI powers MiniMax M3 with 1M context and sparse attention

Together AI powers MiniMax M3 with 1M context and sparse attention

MiniMax Launches Open Source Music Skills for Agents to Compose and Sing

MiniMax Launches Open Source Music Skills for Agents to Compose and Sing

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows

Keep reading

Together AI powers MiniMax M3 with 1M context and sparse attention

Together AI powers MiniMax M3 with 1M context and sparse attention

MiniMax Launches Open Source Music Skills for Agents to Compose and Sing

MiniMax Launches Open Source Music Skills for Agents to Compose and Sing

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows