Cloudflare adds MiniMax M3 with 1M context for agentic coding

MiniMax

Jun 2, 2026 · Updated Jun 13, 2026

Cloudflare has integrated the MiniMax M3 foundation model into its AI Gateway platform. The update provides developers with a high-context, multimodal model specialized for autonomous coding tasks directly within their existing infrastructure.

Cloudflare integrated the MiniMax M3 launch foundation model into its AI Gateway, enabling native multimodal processing and a 1-million-token context window (the data a model processes at once). The model is specialized for agentic coding (AI that autonomously writes, tests, and debugs code) and is now accessible through a single fetch call.

Model: MiniMax M3
Context Window: 1,000,000 tokens
Platform: Cloudflare AI Gateway

Hosting a model with frontier coding performance provides a specialized alternative to general-purpose proprietary systems. This move follows a pattern of high-context models reaching developer gateways, such as the Qwen3.5-Plus update on Vercel, and joins Vercel's MiniMax M3 integration in expanding access to 1M-context workflows. It positions M3 as a competitor to other 1M-context models like the DeepSeek-V4 release that target autonomous development across massive codebases.

Developers can access minimax-m3 via the AI Gateway now. This availability targets teams building agentic workflows that require processing large repositories or long-form documents without managing their own model infrastructure.

View the full update on developers.cloudflare.com

MiniMax (official)

@MiniMax_AIJun 1

M3 on Cloudflare AI Gateway, day one ⚡ Frontier coding, 1M context, and native multimodal and now just one fetch away. It is time to build something. 🦞

193

View on X

Still wondering? A few quick answers below.

MiniMax M3 is a natively multimodal foundation model designed for high-performance coding and agentic tasks. It features a massive 1-million-token context window, allowing it to process entire codebases or long documents in a single interaction. The model is available as an open-weight release for broad developer use.

Developers can access MiniMax M3 through the Cloudflare AI Gateway. By using a single fetch call, users can route their AI requests to the M3 model while benefiting from Cloudflare's management features, including rate limiting and caching, without needing to manage the underlying model infrastructure themselves.

To celebrate the launch, Cloudflare is offering a 50% discount on MiniMax M3 inference costs during the first week of availability. This promotion applies specifically to requests with a context size of 512,000 tokens or less, encouraging developers to test the model's performance on mid-to-large scale tasks.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from MiniMax →

Keep reading

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows

Vercel has integrated the MiniMax M3 foundation model into its AI Gateway, enabling developers to access 1-million-token context and native multimodality through the AI SDK. The model currently leads open-source rankings on Next.js benchmarks, particularly when paired with agentic instructions.

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

OllamaJun 7

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Ollama has made the MiniMax M3 model available on its Cloud, providing US-based access with zero data retention. This integration offers a frontier-level, open-weight model for agentic coding and multimodal tasks, featuring a 1-million-token context window. It expands access to advanced AI capabilities for complex, autonomous workflows.

Cloudflare Integrates GPT-5.5 to Power Persistent Autonomous Agents

CloudflareApr 24

Cloudflare Integrates GPT-5.5 to Power Persistent Autonomous Agents

Cloudflare added OpenAI's GPT-5.5 to its AI Gateway, featuring a 1M token context window and 2x cost efficiency over competing frontier coding models. The model is optimized for agentic loops, enabling systems to plan, use tools, and self-verify their work until a task is complete.

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

OpenRouterJun 1

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

OpenRouter integrated MiniMax-M3, an open-weight multimodal model featuring a 1-million-token context window and specialized sparse attention. By reducing long-context compute costs by 95%, the model enables persistent agentic workflows across massive codebases and video files.

What is MiniMax M3?

How do I access MiniMax M3 on Cloudflare?

What is the Cloudflare AI Gateway discount for MiniMax M3?

Keep reading

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Cloudflare Integrates GPT-5.5 to Power Persistent Autonomous Agents

Cloudflare Integrates GPT-5.5 to Power Persistent Autonomous Agents

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

Keep reading

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows

Vercel adds MiniMax M3 to AI Gateway for 1M context agentic workflows

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Cloudflare Integrates GPT-5.5 to Power Persistent Autonomous Agents

Cloudflare Integrates GPT-5.5 to Power Persistent Autonomous Agents

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding

OpenRouter adds MiniMax-M3 with 1M context for multimodal agentic coding