Kimi.ai
@Kimi_Moonshot
Zhilin's full GTC 2026 keynote is here. If you're curious about the "how" behind scaling Kimi’s latest models, this is the session you can't miss. :) https://t.co/rRgPzau6e5
138retweets
View on X· Updated
Adam optimizer with the Muon optimizer during massive-scale pre-training, which doubles token learning efficiency. The team co-designed model architecture and training infrastructure from Day 0 — on NVIDIA Hopper and Blackwell hardware — to achieve training stability at scale.The session also covers a shift toward AI-native training, where the model actively participates in its own data synthesis, evaluation, and evolution. Separately, Yang presents the case for linear attention architectures as the foundation for longer-running AI agents — a direction that signals where Kimi's next generation of models is heading.
Watch the full on-demand session to follow the Muon optimizer breakdown, the Day 0 co-design methodology, and Kimi's linear attention roadmap.
Zhilin's full GTC 2026 keynote is here. If you're curious about the "how" behind scaling Kimi’s latest models, this is the session you can't miss. :) https://t.co/rRgPzau6e5
More like this


Fireworks AIApr 25
NVIDIAMay 5
CloudflareMar 20