HeadsUpAI

Kimi Reveals How It Scaled K2.5 at NVIDIA GTC 2026

· Updated

Kimi (Moonshot AI), a Chinese AI lab building frontier language models, released its NVIDIA GTC 2026 session on-demand. CEO Zhilin Yang walked through the engineering decisions behind Kimi K2.5: replacing the Adam optimizer with the Muon optimizer during massive-scale pre-training, which doubles token learning efficiency. The team co-designed model architecture and training infrastructure from Day 0 — on NVIDIA Hopper and Blackwell hardware — to achieve training stability at scale.

The session also covers a shift toward AI-native training, where the model actively participates in its own data synthesis, evaluation, and evolution. Separately, Yang presents the case for linear attention architectures as the foundation for longer-running AI agents — a direction that signals where Kimi's next generation of models is heading.

Watch the full on-demand session to follow the Muon optimizer breakdown, the Day 0 co-design methodology, and Kimi's linear attention roadmap.

Kimi.ai
Kimi.ai
@Kimi_Moonshot
X

Zhilin's full GTC 2026 keynote is here. If you're curious about the "how" behind scaling Kimi’s latest models, this is the session you can't miss. :) https://t.co/rRgPzau6e5

138retweets
View on X

Share this update