SiliconFlow adds MiniMax M3 with 1M context and 50 percent discount

MiniMax

Jun 3, 2026 · Updated Jun 13, 2026

MiniMax M3 is now available on SiliconFlow, bringing frontier-grade agentic coding and a million-token context window to the open-weight ecosystem. The launch includes a week-long introductory discount, making high-capacity multimodal reasoning significantly more accessible for developers.

SiliconFlow has added support for MiniMax M3, an open-weight model combining native multimodality, a 1-million-token context window, and agentic coding. Built on the MiniMax Sparse Attention (MSA) architecture, which reduces attention overhead to 5%, the model handles video and computer-use while maintaining a 512K-token minimum context.

Context Window: 1,000,000 tokens
Architecture: MiniMax Sparse Attention
Prefill Speedup: 9.7x at full context
Decoding Speedup: 15.6x at full context
Promo Pricing: $0.30 input / $1.20 output per 1M tokens

This release challenges closed systems by matching performance on specialized benchmarks. MiniMax-M3 scores 83.5 on BrowseComp, surpassing Opus 4.7, and outperforms GPT-5.5 on software engineering tasks. Its MSA architecture enables 9.7x faster prefilling and 15.6x faster decoding at full context, reducing the compute required for long-horizon reasoning.

Developers can access the model via SiliconFlow's API, with a 50% discount through June 7th reducing rates to $0.30 per 1M input tokens. Like the model being available on OpenRouter and brought to Together AI, this launch enables autonomous codebase analysis through tools like Claude Code and Cline.

View the full update on siliconflow.com

MiniMax (official)

@MiniMax_AIJun 3

Day-0 on SiliconFlow and 50% off 🔥 the first week frontier coding, 1M context, and native multimodal, all in one open-weights model. This is what we built M3 for. Go try it 👇

166

View on X

Still wondering? A few quick answers below.

MiniMax M3 is an open-weight multimodal model designed for agentic coding and long-context reasoning. It features a 1-million-token context window and native support for text, image, and video inputs. The model is built on a specialized sparse attention architecture to maintain high performance at extreme context lengths.

The MiniMax Sparse Attention (MSA) architecture uses precise KV-block selection and operator-level optimizations to reduce compute requirements. At a 1-million-token context, it delivers 9.7x faster prefilling and 15.6x faster decoding compared to previous generations, spending only 1/20th of the per-token compute of earlier models.

MiniMax M3 outperforms GPT-5.5 and Gemini 3.1 Pro on the SWE-Bench Pro software engineering benchmark. It also achieved a score of 83.5 on BrowseComp, surpassing Opus 4.7 for autonomous browsing tasks. Additionally, it leads major frontier models on SVG-Bench for vector graphic generation.

Developers can access MiniMax M3 through SiliconFlow's OpenAI-compatible API or playground. It is compatible with agentic tools like Claude Code, Cline, and Roo Code. Through June 7th, SiliconFlow is offering a 50% discount, with rates starting at $0.30 per 1M input tokens.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from MiniMax →

Keep reading

MiniMax brings M3 to local PCs with 1M context open weights

MiniMax announced that its M3 model is joining the NVIDIA and Microsoft local LLM lineup, with weights releasing to the community within 10 days. The move brings high-capacity multimodal reasoning and coding capabilities directly to local hardware.

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

OpenRouterJun 1

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

OpenRouter integrated MiniMax-M3, an open-weight multimodal model featuring a 1-million-token context window and specialized sparse attention. By reducing long-context compute costs by 95%, the model enables persistent agentic workflows across massive codebases and video files.

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

OllamaJun 7

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Ollama has made the MiniMax M3 model available on its Cloud, providing US-based access with zero data retention. This integration offers a frontier-level, open-weight model for agentic coding and multimodal tasks, featuring a 1-million-token context window. It expands access to advanced AI capabilities for complex, autonomous workflows.

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Fireworks AIJun 4

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Fireworks AI is now powering inference for MiniMax M3, a multimodal model featuring a novel sparse attention architecture. The partnership enables 15.6x faster decoding at 1-million-token context, making real-time agentic workflows viable at scale.

What is MiniMax M3?

How does the MSA architecture improve performance?

What are the benchmark results for MiniMax M3?

How can developers access MiniMax M3 on SiliconFlow?

Keep reading

MiniMax brings M3 to local PCs with 1M context open weights

MiniMax brings M3 to local PCs with 1M context open weights

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Keep reading

MiniMax brings M3 to local PCs with 1M context open weights

MiniMax brings M3 to local PCs with 1M context open weights

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding