Xiaomi MiMo Slashes V2.5 API Pricing by 99 Percent

MiMo

May 27, 2026 · Updated Jun 13, 2026

Xiaomi permanently reduced MiMo-V2.5 Series API costs by up to 99% and eliminated tiered pricing for long-context inputs. The update uses inference optimizations to provide 5–8x more tokens in subscription plans, making high-volume agentic workflows significantly more affordable.

Xiaomi MiMo permanently reduced pricing for its MiMo-V2.5 model series by up to 99%. The update removes tiered pricing for long-context inputs, charging a single rate regardless of context usage. This follows the Xiaomi MiMo-V2.5 series launch which introduced native multimodality and reasoning capabilities.

Price reduction: Up to 99%
Token Plan quota: 5–8x increase
Context window: 1M tokens
KV Cache data reduction: Nearly 1/7 of original volume
Effective date: May 26, 2026

This move intensifies the industry-wide shift toward low-cost inference. By slashing costs, Xiaomi MiMo answers DeepSeek's V4 Pro API discount to lower the economic barrier for autonomous agents. The efficiency gains stem from Sliding Window Attention, which reduces internal data transfer by 85%.

You can access the new rates immediately via the MiMo API, with Token Plan quotas increased by 5–8x at no cost. All active credits for current subscribers have been fully reset to reflect the new limits. While the initial creator incentive program has concluded, benefits for Apache Software Foundation members remain active.

View the full update on platform.xiaomimimo.com

Xiaomi MiMo

@XiaomiMiMoMay 26

🚀 Better inference efficiency, lower costs, broader access. MiMo-V2.5 Series API pricing is now permanently reduced — by up to 99% compared to previous pricing. ✨ Unified pricing across all context lengths. MiMo Token Plans have also been upgraded: • 5–8× more usable tokens at the same price • Simpler and more transparent billing rules 🎁 As a thank-you to current users, all current Token Plan credits will be fully reset. 🎧 MiMo-V2.5-TTS remains free for a limited time. ⏰ Effective May 26 at 6:00 PM PDT. These improvements are powered by continued inference optimization and serving efficiency upgrades across the MiMo stack. 🛠️ We’ll also publish a detailed technical blog on the inference optimizations later — stay tuned.

5144.2k

View on X

Still wondering? A few quick answers below.

Xiaomi has permanently reduced the API pricing for the MiMo-V2.5 series by up to 99 percent compared to previous rates. The new billing system is simplified and no longer differentiates based on input context length, meaning developers pay a unified rate regardless of how many tokens are used within the one-million-token window.

The price cuts are driven by significant inference optimizations in the MiMo technical stack. By implementing Sliding Window Attention based on SGLang HiCache, the team reduced data transfer volume for the KV Cache to nearly one-seventh of previous levels. This increased the number of cacheable tokens by five times, drastically improving throughput and efficiency.

The 100 Trillion Token Creator Incentive Plan officially concluded on May 26, 2026, after all tokens were claimed by developers ahead of schedule. While this specific program has ended, the exclusive benefit program for Apache Software Foundation committers remains available long-term and is not affected by the conclusion of the broader incentive plan.

Existing Token Plan subscribers now receive five to eight times more usable tokens for the same price under the upgraded billing rules. As a one-time benefit, Xiaomi has fully reset the credit quotas for all users with an active plan, allowing them to start fresh with the significantly higher token allowances effective immediately.

The MiMo-V2.5-TTS model, which provides text-to-speech capabilities, remains free for a limited time following this pricing update. This is part of Xiaomi's broader effort to encourage developers to integrate and experience the full MiMo-V2.5 series, which also includes the Pro and Omni flagship models designed for complex reasoning and multimodal tasks.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Xiaomi MiMo →

Keep reading

Xiaomi MiMo Engineering Breakthrough Cuts Long Context KVCache Costs Sevenfold

Xiaomi MiMo released a full-pipeline optimization for its MiMo-V2.5 series to maximize the efficiency of its hybrid attention architecture. The update reduces KVCache storage requirements by 7x and achieves a 95% hit rate for long-context agentic workflows.

Xiaomi Launches MiMo-V2.5 Series With 1M Context and Reasoning Tokens

OpenRouterApr 30

Xiaomi Launches MiMo-V2.5 Series With 1M Context and Reasoning Tokens

Xiaomi released the MiMo-V2.5 series on OpenRouter, featuring a 1 million token context window and native multimodal support for image and video tasks. The models are specifically architected for long-horizon agentic workflows and coding, offering reasoning-enabled thinking tokens to improve task stability. By delivering pro-level performance at roughly half the typical inference cost, these models lower the economic barrier for deploying autonomous agents at scale.

OpenCode Adds Xiaomi MiMo v2.5 Models to Go for Agentic Coding

OpenCodeApr 24

OpenCode Adds Xiaomi MiMo v2.5 Models to Go for Agentic Coding

OpenCode integrated Xiaomi's MiMo v2.5 and v2.5 Pro models into its Go platform, offering native multimodality and specialized coding intelligence. These agent-centric models provide a 1-million-token context window for complex engineering tasks at the same price point as previous versions.

Arena.ai Ranks Xiaomi MiMo-V2.5 as Top Open Source Coding Model

ArenaApr 30

Arena.ai Ranks Xiaomi MiMo-V2.5 as Top Open Source Coding Model

Arena.ai validated Xiaomi's MiMo-V2.5-Pro as a top-three open-weight model for frontend web development following its official open-source release under the MIT license. The model features a 1-million-token context window and native multimodality, offering a high-performance alternative for commercial agentic workflows.

What is the new pricing for the Xiaomi MiMo-V2.5 API?

How did Xiaomi achieve such a large price reduction for MiMo-V2.5?

What happened to the Xiaomi MiMo 100 Trillion Token Grant?

How does the new MiMo Token Plan work for existing subscribers?

Is the MiMo-V2.5-TTS model still free to use?

Keep reading

Xiaomi MiMo Engineering Breakthrough Cuts Long Context KVCache Costs Sevenfold

Xiaomi MiMo Engineering Breakthrough Cuts Long Context KVCache Costs Sevenfold

Xiaomi Launches MiMo-V2.5 Series With 1M Context and Reasoning Tokens

Xiaomi Launches MiMo-V2.5 Series With 1M Context and Reasoning Tokens

OpenCode Adds Xiaomi MiMo v2.5 Models to Go for Agentic Coding

OpenCode Adds Xiaomi MiMo v2.5 Models to Go for Agentic Coding

Arena.ai Ranks Xiaomi MiMo-V2.5 as Top Open Source Coding Model

Arena.ai Ranks Xiaomi MiMo-V2.5 as Top Open Source Coding Model

Keep reading

Xiaomi MiMo Engineering Breakthrough Cuts Long Context KVCache Costs Sevenfold

Xiaomi MiMo Engineering Breakthrough Cuts Long Context KVCache Costs Sevenfold

Xiaomi Launches MiMo-V2.5 Series With 1M Context and Reasoning Tokens

Xiaomi Launches MiMo-V2.5 Series With 1M Context and Reasoning Tokens

OpenCode Adds Xiaomi MiMo v2.5 Models to Go for Agentic Coding

OpenCode Adds Xiaomi MiMo v2.5 Models to Go for Agentic Coding

Arena.ai Ranks Xiaomi MiMo-V2.5 as Top Open Source Coding Model

Arena.ai Ranks Xiaomi MiMo-V2.5 as Top Open Source Coding Model