MiniMax M3 arrives with MiniMax Sparse Attention (MSA), 15.6x faster decoding at 1M tokens. We're partnering with @MiniMax_AI to power the inference behind this week's launch. Head to https://t.co/kZWnBSmlt0 to take it for a spin. Once the model weights are released, M3 will be available to the Fireworks community.
Fireworks AI hosts MiniMax M3 with 15x faster long context decoding
- Decoding Speedup
- 15.6x at 1M tokens
- Attention Overhead
- Reduced from 30% to 5%
- Prefill Speedup
- 9.7x at 1M tokens
This update addresses the primary bottleneck in long-context AI: the computational cost of attention. By reducing attention kernel overhead from 30% to 5%, the model maintains uncompressed data without the typical performance penalty. It reaches frontier-grade performance on benchmarks like SWE-Bench Pro, scoring 59.0% for agentic coding tasks.
You can now access MiniMax M3 through Fireworks AI for applications requiring massive data retrieval. The model supports interleaved text, image, and video inputs for workflows like vision-based code evaluation. While weights are restricted, the model will be available to the community once released, following rollouts on SiliconFlow.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →





