Xiaomi MiMo Breaks 1,000 Tokens/s on 1T Model with Standard GPUs

MiMo

Jun 9, 2026 · Updated Jun 20, 2026

Xiaomi MiMo, in collaboration with TileRT, released MiMo-V2.5-Pro-UltraSpeed, achieving over 1,000 tokens/s output speed on a 1-trillion-parameter model using a single standard 8-GPU node. This breakthrough enables real-time AI applications and faster agentic coding by overcoming inference speed bottlenecks on commodity hardware.

Xiaomi MiMo, in collaboration with TileRT, released MiMo-V2.5-Pro-UltraSpeed, a new mode for its 1-trillion-parameter model, reaching over 1,000 tokens per second (tps) on a single standard 8-GPU node. The speed comes from deep model-system co-design using FP4 quantization and DFlash speculative decoding — no specialized hardware required.

Output Speed: 1,000+ tokens/s
Model Parameters: 1 trillion (1T)
Hardware: Single, standard 8-GPU node
API Pricing: 3x cost for ~10x speed boost
Access Window: June 9 – June 23, 2026 (PDT)
Open-Source Checkpoint: MiMo-V2.5-Pro-FP4-DFlash

This makes 1T models viable for real-time decision loops — high-frequency trading, anti-fraud, and surgical assistance — and removes inference latency as a bottleneck for agentic coding workflows. For comparison, DeepSeek V4 Pro reaches 150+ tps and Nemotron 3 Ultra 300+ tps.

Available in a limited window from June 9–23, 2026 (PDT), MiMo-V2.5-Pro-UltraSpeed offers free chat and an API at 3x the standard MiMo-V2.5-Pro price for ~10x the generation speed. The MiMo-V2.5-Pro-FP4-DFlash checkpoint is open-sourced on HuggingFace.

View the full update on mimo.xiaomi.com

Xiaomi MiMo

@XiaomiMiMoJun 8

🚀 1,000+ TOKENS/S ON A 1T MODEL! 🚀 We are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with @TileRT_AI , breaking the 1,000 tokens/s output speed on a 1 Trillion parameter model for the FIRST TIME! Not wafer-scale integration like Cerebras. Not pure on-chip SRAM chips like Groq. We achieve 1,000 tps on a 1T MoE model using just a SINGLE, STANDARD 8-GPGPU NODE. Read the full technical deep dive：https://t.co/MX0kjHKdKi Want to experience the future of real-time AI? 👉 Apply for UltraSpeed now: https://t.co/aeWAxyhwVk ⏳ Limited-Time Access: Application-based · Jun 8 – Jun 23 (PDT) 💬 Chat Experience: Completely FREE for a limited time — try the blazing-fast web chat now. ⚡ UltraSpeed API: Just 3x the price for a ~10x boost in output experience. 🤝 Enterprise & Large-Scale Needs: business-mimo@xiaomi.com

2942.3k

View on X

Still wondering? A few quick answers below.

Xiaomi MiMo-V2.5-Pro-UltraSpeed is a new mode for Xiaomi's 1-trillion-parameter AI model, developed in collaboration with TileRT. It achieves over 1,000 tokens per second in output speed using standard GPU hardware.

The speed is a result of extreme model-system co-design. This includes FP4 quantization, which reduces model size, and DFlash speculative decoding, an efficient method for parallel prediction, all optimized by TileRT's inference system.

The high speed enables real-time decision-making for complex tasks like high-frequency trading, instant anti-fraud, and surgical assistance. It also aims to significantly accelerate agentic coding workflows by reducing inference latency.

Access is currently application-based and available for a limited time from June 9 to June 23, 2026 (PDT). A free chat experience is offered, and the API is available at a promotional price.

Yes, Xiaomi MiMo has open-sourced the MiMo-V2.5-Pro-FP4-DFlash checkpoint on HuggingFace, which includes the FP4 quantized weights and DFlash model parameters.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Xiaomi MiMo →

Keep reading

Xiaomi Launches MiMo-V2.5 Series With 1M Context and Reasoning Tokens

Xiaomi released the MiMo-V2.5 series on OpenRouter, featuring a 1 million token context window and native multimodal support for image and video tasks. The models are specifically architected for long-horizon agentic workflows and coding, offering reasoning-enabled thinking tokens to improve task stability. By delivering pro-level performance at roughly half the typical inference cost, these models lower the economic barrier for deploying autonomous agents at scale.

Xiaomi MiMo Engineering Breakthrough Cuts Long Context KVCache Costs Sevenfold

MiMoMay 31

Xiaomi MiMo Engineering Breakthrough Cuts Long Context KVCache Costs Sevenfold

Xiaomi MiMo released a full-pipeline optimization for its MiMo-V2.5 series to maximize the efficiency of its hybrid attention architecture. The update reduces KVCache storage requirements by 7x and achieves a 95% hit rate for long-context agentic workflows.

OpenCode Adds Xiaomi MiMo v2.5 Models to Go for Agentic Coding

OpenCodeApr 24

OpenCode Adds Xiaomi MiMo v2.5 Models to Go for Agentic Coding

OpenCode integrated Xiaomi's MiMo v2.5 and v2.5 Pro models into its Go platform, offering native multimodality and specialized coding intelligence. These agent-centric models provide a 1-million-token context window for complex engineering tasks at the same price point as previous versions.

Arena.ai Ranks Xiaomi MiMo-V2.5 as Top Open Source Coding Model

ArenaApr 30

Arena.ai Ranks Xiaomi MiMo-V2.5 as Top Open Source Coding Model

Arena.ai validated Xiaomi's MiMo-V2.5-Pro as a top-three open-weight model for frontend web development following its official open-source release under the MIT license. The model features a 1-million-token context window and native multimodality, offering a high-performance alternative for commercial agentic workflows.

What is Xiaomi MiMo-V2.5-Pro-UltraSpeed?

How does MiMo-V2.5-Pro-UltraSpeed achieve its speed?

What are the key applications for this speed?

How can I access MiMo-V2.5-Pro-UltraSpeed?

Is the technology open-source?

Keep reading

Xiaomi Launches MiMo-V2.5 Series With 1M Context and Reasoning Tokens

Xiaomi Launches MiMo-V2.5 Series With 1M Context and Reasoning Tokens

Xiaomi MiMo Engineering Breakthrough Cuts Long Context KVCache Costs Sevenfold

Xiaomi MiMo Engineering Breakthrough Cuts Long Context KVCache Costs Sevenfold

OpenCode Adds Xiaomi MiMo v2.5 Models to Go for Agentic Coding

OpenCode Adds Xiaomi MiMo v2.5 Models to Go for Agentic Coding

Arena.ai Ranks Xiaomi MiMo-V2.5 as Top Open Source Coding Model

Arena.ai Ranks Xiaomi MiMo-V2.5 as Top Open Source Coding Model

Keep reading

Xiaomi Launches MiMo-V2.5 Series With 1M Context and Reasoning Tokens

Xiaomi Launches MiMo-V2.5 Series With 1M Context and Reasoning Tokens

Xiaomi MiMo Engineering Breakthrough Cuts Long Context KVCache Costs Sevenfold

Xiaomi MiMo Engineering Breakthrough Cuts Long Context KVCache Costs Sevenfold

OpenCode Adds Xiaomi MiMo v2.5 Models to Go for Agentic Coding

OpenCode Adds Xiaomi MiMo v2.5 Models to Go for Agentic Coding

Arena.ai Ranks Xiaomi MiMo-V2.5 as Top Open Source Coding Model

Arena.ai Ranks Xiaomi MiMo-V2.5 as Top Open Source Coding Model