HeadsUpAI

SiliconFlow adds MiniMax M3 with 1M context and 50 percent discount

SiliconFlow has added support for MiniMax M3, an open-weight model combining native multimodality, a 1-million-token context window, and agentic coding. Built on the MiniMax Sparse Attention (MSA) architecture, which reduces attention overhead to 5%, the model handles video and computer-use while maintaining a 512K-token minimum context.
Context Window
1,000,000 tokens
Architecture
MiniMax Sparse Attention
Prefill Speedup
9.7x at full context
Decoding Speedup
15.6x at full context
Promo Pricing
$0.30 input / $1.20 output per 1M tokens

This release challenges closed systems by matching performance on specialized benchmarks. MiniMax-M3 scores 83.5 on BrowseComp, surpassing Opus 4.7, and outperforms GPT-5.5 on software engineering tasks. Its MSA architecture enables 9.7x faster prefilling and 15.6x faster decoding at full context, reducing the compute required for long-horizon reasoning.

Developers can access the model via SiliconFlow's API, with a 50% discount through June 7th reducing rates to $0.30 per 1M input tokens. Like the model being available on OpenRouter and brought to Together AI, this launch enables autonomous codebase analysis through tools like Claude Code and Cline.

MiniMax (official)
MiniMax (official)
@MiniMax_AI
X

Day-0 on SiliconFlow and 50% off 🔥 the first week frontier coding, 1M context, and native multimodal, all in one open-weights model. This is what we built M3 for. Go try it 👇

1retweets66likes
View on X

Still wondering? A few quick answers below.

MiniMax M3 is an open-weight multimodal model designed for agentic coding and long-context reasoning. It features a 1-million-token context window and native support for text, image, and video inputs. The model is built on a specialized sparse attention architecture to maintain high performance at extreme context lengths.

The MiniMax Sparse Attention (MSA) architecture uses precise KV-block selection and operator-level optimizations to reduce compute requirements. At a 1-million-token context, it delivers 9.7x faster prefilling and 15.6x faster decoding compared to previous generations, spending only 1/20th of the per-token compute of earlier models.

MiniMax M3 outperforms GPT-5.5 and Gemini 3.1 Pro on the SWE-Bench Pro software engineering benchmark. It also achieved a score of 83.5 on BrowseComp, surpassing Opus 4.7 for autonomous browsing tasks. Additionally, it leads major frontier models on SVG-Bench for vector graphic generation.

Developers can access MiniMax M3 through SiliconFlow's OpenAI-compatible API or playground. It is compatible with agentic tools like Claude Code, Cline, and Roo Code. Through June 7th, SiliconFlow is offering a 50% discount, with rates starting at $0.30 per 1M input tokens.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update