Fireworks AI Launches Infrastructure for Training Trillion Parameter MoE Models

Fireworks AI

Apr 5, 2026 · Updated Apr 25, 2026

Fireworks AI released a major update to its Training SDK featuring Blackwell-native kernels and 4D parallelism for trillion-parameter Mixture-of-Experts models. By fusing reinforcement learning losses and optimizing for asynchronous data, the platform enables frontier-grade model training that was previously restricted to elite research labs.

Fireworks AI updated its Training SDK with a specialized engine for trillion-parameter Mixture-of-Experts models like Qwen3.5 and Kimi K2.5. The system introduces composable 4D parallelism, which automatically orchestrates data, pipeline, context, and expert sharding. This infrastructure recently powered the training of Cursor's Composer 2 model.

Training frontier models is increasingly an infrastructure bottleneck rather than a modeling one. The new stack utilizes MXFP8 kernels on NVIDIA Blackwell hardware to deliver significant speedups over BF16 without losing numerical accuracy. Fused reinforcement learning losses also provide a 2x performance boost for PPO by eliminating redundant forward passes.

You can now access these training shapes through the Training SDK to fine-tune models at context lengths up to one million tokens. For resource-constrained environments, the platform supports LoRA fine-tuning of trillion-parameter models on a single 8-GPU node using 4x expert quantization. Managed fine-tuning and custom training loops are available via the API.

View the full update on fireworks.ai

Fireworks AI

@FireworksAI_HQApr 4

Training trillion-parameter MoEs is an infra problem disguised as a modeling problem. So we built the infra solution. Cursor used it to train Composer 2. Now it's available for Kimi K2.5, Qwen3.5 397B, MiniMax M2.5, and more: →Fused RL loss (~2x faster PPO) →MXFP8 expert kernels on Blackwell →Composable 4D parallelism →1M+ token context training validated Here's how it all works ↓ https://t.co/PA20I8EFaD

26245

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Fireworks AI →

Keep reading

Fireworks AI Launches Training Platform to Fine-Tune Frontier Models at Scale

Fireworks AI released a training platform in preview that supports full-parameter fine-tuning for models ranging from 8B to 1T parameters. This allows teams to move beyond prompt engineering by using reinforcement learning to build proprietary models that outperform closed frontier systems on specific tasks.