Fireworks Training Platform continues to expand. Today GLM 5.1 LoRA RL is now live via Training API: SFT, DPO, and full RL on a 200K context window → custom loss functions or smart defaults. No usage ceilings. No credits to claim. Your model. Your inference. Get started → https://t.co/sBNnhKT5dq
Fireworks AI Adds Reinforcement Learning for GLM 5.1 to Build Custom Agents
Training API. This update builds on the platform's GLM 5.1 fine-tuning support by using LoRA (an efficient adaptation technique) to perform RL on a 200K context window.- Context window
- 200K tokens
- Training methods
- SFT, DPO, and LoRA RL
- Usage limits
- None
- Model ownership
- Full weight ownership
- Access
- Training API
Adding RL matters for teams building specialized reasoning agents that fine-tuning alone can't shape. Custom loss functions let teams encode their own domain objectives instead of relying on rigid recipes, turning proprietary data into a differentiated model. This follows the launch of Fireworks AI's Qwen 3.6 fine-tuning and a broader expansion of the Fireworks Training Platform.
You can access these capabilities through the Training API with no usage ceilings. SFT, DPO, and RL run on the same infrastructure that serves Fireworks production inference, so a trained checkpoint becomes a live endpoint without format conversion or stack migration. You retain ownership of the resulting model weights for your own deployment.
Still wondering? A few quick answers below.




