HeadsUpAI

MiniMax brings M3 to local PCs with 1M context open weights

MiniMax is bringing its MiniMax-M3 model to local hardware as part of a new NVIDIA and Microsoft lineup at GTC Taipei. The model is an open-weight multimodal LLM (a model processing text, images, and video) featuring a 1-million-token context window. While the full context remains server-class, the company will release weights for local deployment in less than 10 days.
Model
MiniMax-M3 (open-weight)
Lineup
NVIDIA + Microsoft Local LLM (GTC Taipei)
Context Window
1M server-class, reduced on consumer
Weights Release
Ships in under 10 days
Consumer Hardware
Quantized runs required

This release bridges the gap between cloud capacity and local privacy. By utilizing MiniMax Sparse Attention, the model reduces overhead to maintain performance. It holds strong benchmarks in agentic coding, offering a local alternative for developers who need to process entire codebases or long video files on-device.

Consumer PCs will require quantization (compressing models to run on smaller chips) and reduced context, but full weights will be available for self-hosting in under 10 days. This enables workflows where sensitive data stays local while still benefiting from frontier-level reasoning and native multimodal support.

MiniMax (official)
MiniMax (official)
@MiniMax_AI
X

We are part of @nvidia and @Microsoft ’s Local LLM lineup at #GTC Taipei.🔥 The PC is being reinvented around local, agentic, open-weight models MiniMax-M3 is built exactly for this future: Open-weight. 1M context. Strong coding. Native multimodality. Excited for what comes next!

5retweets72likes
View on X

Still wondering? A few quick answers below.

MiniMax-M3 is an open-weight multimodal model being integrated into the NVIDIA and Microsoft local AI ecosystem. It features a 1-million-token context window and specialized coding capabilities, designed to run directly on local hardware rather than relying solely on cloud-based inference for agentic tasks.

Yes, but with limitations. While the model supports a full 1-million-token context on server-class hardware, consumer PCs will utilize quantized versions of the model. These versions use reduced precision to fit on local GPUs, which results in a smaller context window compared to the full server-grade deployment.

MiniMax plans to release the model weights to the community in less than 10 days from the GTC Taipei announcement. This release will allow developers and users to self-host the model on their own infrastructure, enabling private, on-device processing of text, images, and video data.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update