Zhilin's full GTC 2026 keynote is here. If you're curious about the "how" behind scaling Kimi’s latest models, this is the session you can't miss. :) https://t.co/rRgPzau6e5
Kimi Reveals How It Scaled K2.5 at NVIDIA GTC 2026
· Updated
Kimi CEO Zhilin Yang detailed the training innovations behind Kimi K2.5 at NVIDIA GTC 2026. The session covers the Muon optimizer replacing Adam to double token learning efficiency, AI-native training, and a shift toward linear attention for longer-running agents.
Adam optimizer with the Muon optimizer during massive-scale pre-training, which doubles token learning efficiency. The team co-designed model architecture and training infrastructure from Day 0 — on NVIDIA Hopper and Blackwell hardware — to achieve training stability at scale.The session also covers a shift toward AI-native training, where the model actively participates in its own data synthesis, evaluation, and evolution. Separately, Yang presents the case for linear attention architectures as the foundation for longer-running AI agents — a direction that signals where Kimi's next generation of models is heading.
Watch the full on-demand session to follow the Muon optimizer breakdown, the Day 0 co-design methodology, and Kimi's linear attention roadmap.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →





