Training Kimi K2 and Qwen3 30B-scale models efficiently requires more than standard data-parallel tricks. NVIDIA Megatron Core now provides end-to-end support for emerging higher-order optimizers like Muon, alongside research optimizers such as MOP and REKLS, to push training efficiency on GB300 GPUs and NVL72 systems. Full breakdown 👇 https://t.co/D7E55OnCiK
NVIDIA Megatron Core Adds Muon Support to Accelerate Frontier Model Training
The move addresses scaling challenges for 30B-parameter models, alongside NVIDIA's Qwen3 FP8 training support and Kimi K2's token learning efficiency. While standard methods hit efficiency ceilings, higher-order optimizers like Muon significantly increase token learning efficiency. This efficiency mirrors NVIDIA's Nemotron 3 Super leaderboard ranking on enterprise benchmarks.
You can now implement these optimizers directly within the Megatron Core workflow to reduce compute required for frontier-scale training. The support is tuned for the Blackwell architecture, enabling near-parity throughput to traditional methods while achieving faster convergence. The updated framework is available via the official NVIDIA developer portal.
Still wondering? A few quick answers below.





