Microsoft MAI-Transcribe-1.5 delivers top tier accuracy at 276x real time speed

Artificial Analysis

Jun 2, 2026 · Updated Jun 20, 2026

Microsoft has released MAI-Transcribe-1.5, a speech-to-text model that ranks third for accuracy while processing audio at 276x real-time speed. The model leads the accuracy-speed Pareto frontier, offering a high-performance alternative for high-volume enterprise audio workloads.

Artificial Analysis has evaluated Microsoft's new MAI-Transcribe-1.5, finding it achieves a 2.4% word error rate at 276x real-time speed. It ranks third for accuracy on the AA-WER leaderboard and is the fastest model among the top ten most accurate systems. This speed factor (the ratio of audio duration to processing time) is more than double that of the next-fastest model among the top ten most accurate systems.

Model: MAI-Transcribe-1.5
Word Error Rate: 2.4%
Speed Factor: 276x real-time
Price: $6 per 1,000 minutes
Language Support: 43 languages

The model leads the accuracy-speed Pareto frontier, offering a high-throughput alternative to Cohere Transcribe. This follows a trend of high-ranking media models from the company, including MAI-Image-2.5, which recently secured a top-three spot for image quality. It bridges the gap between transcription precision and processing speed for production environments.

Available for $6 per 1,000 minutes via Microsoft Foundry, the model supports 43 languages and keyword biasing for specialized terms. It performed exceptionally well on the VoxPopuli dataset with a 1.6% error rate. This makes it a viable choice for high-volume enterprise audio workloads requiring both speed and reliability without the typical performance trade-offs.

View the full update on artificialanalysis.ai

Artificial Analysis

@ArtificialAnlysJun 2

Microsoft has released MAI-Transcribe-1.5: an exceptionally fast speech transcription model at a speed factor of ~276x, while still achieving 2.4% on AA-WER (#3), leading the accuracy-speed Pareto frontier MAI-Transcribe-1.5 is Microsoft AI (MAI)’s latest speech transcription model, coming in at 3rd overall on the on the Artificial Analysis Word Error Rate (AA-WER) leaderboard, behind Alibaba’s Fun-Realtime-ASR-preview (1.7% WER), and ElevenLabs Scribe v2 (2.2% WER). The model stands out as the fastest STT model in the top 10 for accuracy, processing audio at ~276x real-time - this is more than double the speed of the second fastest model in the top 10 for accuracy. The new model supports keyword biasing (improved recognition of rarer vocabulary such as names and medical terminology), in addition to support for 43 languages including English, French, Arabic, Japanese, and Chinese. See more details below ⬇️

16271

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Artificial Analysis →

Keep reading

OpenRouter adds Microsoft MAI models for high speed multimodal generation

OpenRouter has integrated Microsoft’s new in-house MAI-Image-2.5, MAI-Transcribe-1.5, and MAI-Voice-2 models into its unified API. These models provide a high-performance stack for image, speech, and voice tasks built entirely without third-party distillation.

CohereMar 28

Cohere Releases Open Source Transcribe Model Outperforming Whisper on Accuracy

Cohere launched Transcribe, a 2-billion parameter open-source speech recognition model that currently holds the top spot on the HuggingFace Open ASR Leaderboard. By achieving a 5.42% word error rate, it provides a high-accuracy, high-throughput alternative for enterprise workflows that previously relied on larger or proprietary models.

Microsoft AI launches MAI model family for private enterprise workflow tuning

Microsoft AIJun 2

Microsoft AI launches MAI model family for private enterprise workflow tuning

Microsoft AI released seven in-house models spanning reasoning, coding, and media generation at its Build conference. These models are built from scratch without distillation to support a new Frontier Tuning framework for private enterprise workflows. This shift allows organizations to train custom models on their own data traces while maintaining full ownership of institutional knowledge.

Arena.ai Ranks Microsoft MAI-Image-2.5 at Number Two for Image Editing

ArenaMay 26

Arena.ai Ranks Microsoft MAI-Image-2.5 at Number Two for Image Editing

Arena.ai officially ranked Microsoft's MAI-Image-2.5 model at #2 in its Image Edit leaderboard with a score of 1401, advancing the Pareto frontier for generative quality. The model outperformed high-fidelity offerings from xAI and OpenAI by 10 points in blind human-preference testing.