MiniMax brings M3 to local PCs with 1M context open weights

MiniMax

Jun 4, 2026 · Updated Jun 20, 2026

MiniMax announced that its M3 model is joining the NVIDIA and Microsoft local LLM lineup, with weights releasing to the community within 10 days. The move brings high-capacity multimodal reasoning and coding capabilities directly to local hardware.

MiniMax is bringing its MiniMax-M3 model to local hardware as part of a new NVIDIA and Microsoft lineup at GTC Taipei. The model is an open-weight multimodal LLM (a model processing text, images, and video) featuring a 1-million-token context window. While the full context remains server-class, the company will release weights for local deployment in less than 10 days.

Model: MiniMax-M3 (open-weight)
Lineup: NVIDIA + Microsoft Local LLM (GTC Taipei)
Lineup Peers: DeepSeek, Gemma, Qwen, GLM, and others
Windows Architecture: OpenShell agentic stack
Context Window: 1M server-class, reduced on consumer
Weights Release: Ships in under 10 days

This release bridges the gap between cloud capacity and local privacy. By utilizing MiniMax Sparse Attention, the model reduces overhead to maintain performance. It holds strong benchmarks in agentic coding, offering a local alternative for developers who need to process entire codebases or long video files on-device.

Consumer PCs will require quantization (compressing models to run on smaller chips) and reduced context, but full weights will be available for self-hosting in under 10 days. This enables workflows where sensitive data stays local while still benefiting from frontier-level reasoning and native multimodal support.

View the full update on x.com

MiniMax (official)

@MiniMax_AIJun 3

We are part of @nvidia and @Microsoft ’s Local LLM lineup at #GTC Taipei.🔥 The PC is being reinvented around local, agentic, open-weight models MiniMax-M3 is built exactly for this future: Open-weight. 1M context. Strong coding. Native multimodality. Excited for what comes next!

572

View on X

Still wondering? A few quick answers below.

MiniMax-M3 is an open-weight multimodal model being integrated into the NVIDIA and Microsoft local AI ecosystem. It features a 1-million-token context window and specialized coding capabilities, designed to run directly on local hardware rather than relying solely on cloud-based inference for agentic tasks.

Yes, but with limitations. While the model supports a full 1-million-token context on server-class hardware, consumer PCs will utilize quantized versions of the model. These versions use reduced precision to fit on local GPUs, which results in a smaller context window compared to the full server-grade deployment.

MiniMax plans to release the model weights to the community in less than 10 days from the GTC Taipei announcement. This release will allow developers and users to self-host the model on their own infrastructure, enabling private, on-device processing of text, images, and video data.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from MiniMax →

Keep reading

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

OpenRouter integrated MiniMax-M3, an open-weight multimodal model featuring a 1-million-token context window and specialized sparse attention. By reducing long-context compute costs by 95%, the model enables persistent agentic workflows across massive codebases and video files.

SiliconFlow adds MiniMax M3 with 1M context and 50 percent discount

MiniMaxJun 3

SiliconFlow adds MiniMax M3 with 1M context and 50 percent discount

MiniMax M3 is now available on SiliconFlow, bringing frontier-grade agentic coding and a million-token context window to the open-weight ecosystem. The launch includes a week-long introductory discount, making high-capacity multimodal reasoning significantly more accessible for developers.

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

OllamaJun 7

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Ollama has made the MiniMax M3 model available on its Cloud, providing US-based access with zero data retention. This integration offers a frontier-level, open-weight model for agentic coding and multimodal tasks, featuring a 1-million-token context window. It expands access to advanced AI capabilities for complex, autonomous workflows.

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Fireworks AIJun 4

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Fireworks AI is now powering inference for MiniMax M3, a multimodal model featuring a novel sparse attention architecture. The partnership enables 15.6x faster decoding at 1-million-token context, making real-time agentic workflows viable at scale.

What is the MiniMax-M3 local release?

Can MiniMax-M3 run on a standard consumer PC?

When will the MiniMax-M3 weights be available?

Keep reading

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

SiliconFlow adds MiniMax M3 with 1M context and 50 percent discount

SiliconFlow adds MiniMax M3 with 1M context and 50 percent discount

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Keep reading

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

SiliconFlow adds MiniMax M3 with 1M context and 50 percent discount

SiliconFlow adds MiniMax M3 with 1M context and 50 percent discount

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding