Kimi K2.6 from @Kimi_Moonshot is now available on @FireworksAI_HQ Training Platform across the Managed and Training API workflows. Try SFT, DPO, RL with smart defaults or your own custom loss function with industry leading 265K context window. https://t.co/jqKuwWWEB0
Fireworks AI Adds Kimi K2.6 Training to Build Custom Frontier Agents
Fireworks AI, an inference platform for fast model serving, added Kimi K2.6 to its training platform across managed and API workflows. Users can perform supervised fine-tuning (
SFT) on the 1-trillion parameter Mixture-of-Experts (MoE) model (architecture activating only relevant sub-networks). The platform supports the model's full 265K context window.- Model
- Kimi K2.6
- Parameters
- 1 Trillion (MoE)
- Context window
- 265K tokens
- Training methods
- SFT, DPO, RL, Custom Loss
- Infrastructure
- Blackwell B200 support
- Availability
- Managed Training and Training API
This release follows Fireworks AI's Day-0 Kimi K2.6 inference support and provides the infrastructure to customize the model powering Cursor's Composer 2. By training and serving on the same hardware, teams avoid numerical drift—where model behavior changes when moving from training to production inference.
You can now use the Training API for custom loss functions or reinforcement learning (RL) at scale. Managed training handles GPU provisioning and distributed scaling for jobs from small adapters to full-parameter tuning. Access is available via the Fireworks dashboard, with pricing based on GPU-hour or token usage.
Fireworks AI
@FireworksAI_HQ
7retweets93likes
View on XStill wondering? A few quick answers below.
The Training Agent is an autonomous tool for product teams that handles data cleaning, model selection, and deployment using LoRA-only methods. Managed Training is designed for ML engineers, offering deeper control over methods like SFT and DPO, including full-parameter training for behavioral changes that adapter-based methods cannot achieve.
Fireworks AI supports full-parameter training across its model catalog, including models up to 1 trillion parameters like Kimi K2.5. While Kimi K2.6 is currently available for SFT, DPO, and RL workflows, the platform handles the distributed complexity required for these massive Mixture-of-Experts models on high-performance hardware like Blackwell B200 GPUs.
Fireworks AI achieves parity by using the same kernels and hardware for both training and production inference. This eliminates numerical drift, ensuring that model behavior during evaluation matches its performance in production. The platform publishes KL divergence values for its catalog, where values below 0.01 indicate the training and serving stacks are numerically identical.
The platform supports several fine-tuning and alignment methods, including Supervised Fine-Tuning, Direct Preference Optimization, and Reinforcement Fine-Tuning. Users can chain these stages together, such as moving from SFT to DPO while preserving optimizer states. The Training API also allows researchers to implement custom loss functions like GRPO or DAPO without rigid recipes.
When training Kimi K2.6 on the Fireworks platform, you can utilize an industry-leading 265K context window. This allows the model to process and learn from massive datasets or long-horizon tasks without losing information. The platform manages the underlying distributed systems complexity, such as composable parallelism, to support these large context requirements at scale.





