HeadsUpAI

Microsoft MAI-Transcribe-1.5 delivers top tier accuracy at 276x real time speed

Artificial Analysis has evaluated Microsoft's new MAI-Transcribe-1.5, finding it achieves a 2.4% word error rate at 276x real-time speed. It ranks third for accuracy on the AA-WER leaderboard and is the fastest model among the top ten most accurate systems. This speed factor (the ratio of audio duration to processing time) is more than double that of the next-fastest model among the top ten most accurate systems.
Model
MAI-Transcribe-1.5
Word Error Rate
2.4%
Speed Factor
276x real-time
Price
$6 per 1,000 minutes
Language Support
43 languages

The model leads the accuracy-speed Pareto frontier, offering a high-throughput alternative to Cohere Transcribe. This follows a trend of high-ranking media models from the company, including MAI-Image-2.5, which recently secured a top-three spot for image quality. It bridges the gap between transcription precision and processing speed for production environments.

Available for $6 per 1,000 minutes via Microsoft Foundry, the model supports 43 languages and keyword biasing for specialized terms. It performed exceptionally well on the VoxPopuli dataset with a 1.6% error rate. This makes it a viable choice for high-volume enterprise audio workloads requiring both speed and reliability without the typical performance trade-offs.

Artificial Analysis
Artificial Analysis
@ArtificialAnlys
X

Microsoft has released MAI-Transcribe-1.5: an exceptionally fast speech transcription model at a speed factor of ~276x, while still achieving 2.4% on AA-WER (#3), leading the accuracy-speed Pareto frontier MAI-Transcribe-1.5 is Microsoft AI (MAI)’s latest speech transcription model, coming in at 3rd overall on the on the Artificial Analysis Word Error Rate (AA-WER) leaderboard, behind Alibaba’s Fun-Realtime-ASR-preview (1.7% WER), and ElevenLabs Scribe v2 (2.2% WER). The model stands out as the fastest STT model in the top 10 for accuracy, processing audio at ~276x real-time - this is more than double the speed of the second fastest model in the top 10 for accuracy. The new model supports keyword biasing (improved recognition of rarer vocabulary such as names and medical terminology), in addition to support for 43 languages including English, French, Arabic, Japanese, and Chinese. See more details below ⬇️

16retweets271likes
View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update