HeadsUpAI

Together AI powers MiniMax M3 with 1M context and sparse attention

Together AI, a research-optimized platform for model inference, is now hosting inference (running a trained model to generate outputs) for MiniMax M3, a multimodal model with a 1-million-token context window. It uses MiniMax Sparse Attention (MSA) to process massive datasets without the exponential compute costs of full attention.
Context Window
1,000,000 tokens
Architecture
MiniMax Sparse Attention
Coding Benchmark
59.0% SWE-Bench Pro
Agent Benchmark
74.2% MCP Atlas
Input Modalities
Text, Image, Video

MiniMax M3 matches full attention performance across multiple benchmarks while reducing per-token compute to 1/20th of previous generations at a 1-million-token context length. This architecture enables autonomous workflows like CUDA kernel optimization, building on the MiniMax M3 technical highlights. The model's native multimodality allows semantic spaces to merge deeply during training.

Access MiniMax M3 via the MiniMax Code app or the Together AI API, available alongside other providers like SiliconFlow. The model supports "thinking" modes for reasoning and "computer use" for desktop automation. Together AI provides the research-optimized infrastructure required to deploy and scale these models in production.

Together AI
Together AI
@togethercompute
X

MiniMax M3 is live and Together AI is powering its inference 🚀 Tomorrow at 6pm PT we're going live on X Spaces with the teams behind the model and the infrastructure to give you a deep dive. https://t.co/wPayfOWmNg

16retweets70likes
View on X

Still wondering? A few quick answers below.

MiniMax M3 is a natively multimodal frontier model designed for complex agentic tasks and long-context reasoning. It supports up to 1 million tokens and is optimized for coding, autonomous research, and desktop automation.

MSA uses a pre-filtering stage to partition data into blocks, avoiding the quadratic computational growth of traditional attention mechanisms. This reduces per-token compute to 1/20th of previous models, enabling faster processing of massive datasets.

The model is built for long-horizon tasks like independent paper reproduction, CUDA kernel optimization, and autonomous software engineering. Its native multimodality also enables "computer use" capabilities for automating cross-application workflows.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update