OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

OpenRouter

Jun 1, 2026 · Updated Jun 20, 2026

OpenRouter integrated MiniMax-M3, an open-weight multimodal model featuring a 1-million-token context window and specialized sparse attention. By reducing long-context compute costs by 95%, the model enables persistent agentic workflows across massive codebases and video files.

OpenRouter added MiniMax-M3, a multimodal foundation model (a base AI system trained on broad data) built for long-horizon agentic tasks. It supports a 1-million-token context window and native video inputs. Using MiniMax Sparse Attention, it replaces full attention with KV-block selection, cutting long-context compute costs by 95%.

Context Window: 1,048,576 tokens
Max Output: 512,000 tokens
Input Modalities: Text, Image, Video
Architecture: MiniMax Sparse Attention (MSA)
Pricing (Promo): $0.30/M input, $1.20/M output

This release extends the agentic trajectory of MiniMax M2.7 into multimodal territory. It follows the pattern of DeepSeek-V4 in making 1M-token context the standard for autonomous agents. Native multimodality enables reasoning across interleaved text, image, and video data during complex, multi-step workflows.

Access minimax/minimax-m3 via OpenRouter at a 50% discount through June 7, 2026, priced at $0.30 per million input tokens. The model supports a reasoning parameter to expose internal thinking tokens and is optimized for multi-turn collaboration. Open weights are available on Hugging Face.

View the full update on openrouter.ai

OpenRouter

@OpenRouterJun 1

MiniMax-M3 is live on OpenRouter! A frontier-class open-weight model that combines a 1M-token context window, frontier coding and agentic performance, and native multimodality (image & video) in one model. https://t.co/ocxd2OSYkk

21431

View on X

Still wondering? A few quick answers below.

MiniMax-M3 is a multimodal foundation model designed for long-horizon agentic tasks and coding. It features a 1-million-token context window and native support for text, image, and video inputs, allowing it to reason across massive datasets in a single inference loop.

MiniMax Sparse Attention (MSA) replaces traditional full attention mechanisms with KV-block selection. This architectural shift reduces the computational resources required for long-context processing, cutting per-token compute costs by approximately 95% compared to previous model generations while maintaining output quality.

The model is available via the OpenRouter API as minimax/minimax-m3, featuring a 50% discount during its launch week. Additionally, MiniMax has released the model weights on Hugging Face, allowing developers to download and run the model on their own infrastructure.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from OpenRouter →

Keep reading

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Ollama has made the MiniMax M3 model available on its Cloud, providing US-based access with zero data retention. This integration offers a frontier-level, open-weight model for agentic coding and multimodal tasks, featuring a 1-million-token context window. It expands access to advanced AI capabilities for complex, autonomous workflows.

MiniMax M3 drops attention overhead from 30 to 5 percent

MiniMaxJun 3

MiniMax M3 drops attention overhead from 30 to 5 percent

MiniMax revealed technical highlights for its M3 model, featuring a Sparse Attention architecture that maintains uncompressed data for its 1-million-token context window. The update reduces attention kernel overhead from 30% to 5% of per-decode wall-clock time and introduces vision-coding capabilities where the model self-evaluates its own rendered UI.

Together AI powers MiniMax M3 with 1M context and sparse attention

Together AIJun 3

Together AI powers MiniMax M3 with 1M context and sparse attention

Together AI is now powering inference for MiniMax M3, a multimodal model featuring a 1-million-token context window. The model uses a new sparse attention architecture to process massive datasets with significantly lower computational overhead than previous-generation models.

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows

VercelJun 2

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows

Vercel has integrated the MiniMax M3 foundation model into its AI Gateway, enabling developers to access 1-million-token context and native multimodality through the AI SDK. The model currently leads open-source rankings on Next.js benchmarks, particularly when paired with agentic instructions.

What is MiniMax-M3?

How does MiniMax Sparse Attention work?

Where can I access MiniMax-M3?

Keep reading

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

MiniMax M3 drops attention overhead from 30 to 5 percent

MiniMax M3 drops attention overhead from 30 to 5 percent

Together AI powers MiniMax M3 with 1M context and sparse attention

Together AI powers MiniMax M3 with 1M context and sparse attention

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows

Keep reading

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

MiniMax M3 drops attention overhead from 30 to 5 percent

MiniMax M3 drops attention overhead from 30 to 5 percent

Together AI powers MiniMax M3 with 1M context and sparse attention

Together AI powers MiniMax M3 with 1M context and sparse attention

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows