Fireworks AI Adds Reinforcement Learning for GLM 5.1 to Build Custom Agents

Fireworks AI

May 15, 2026 · Updated Jun 12, 2026

Fireworks AI expanded its training platform to support LoRA-based reinforcement learning for Z.ai's GLM 5.1 model. This allows developers to align the model's reasoning steps with specific domain logic using custom loss functions on a 200K context window.

Fireworks AI, an inference platform for fast model serving, added reinforcement learning (RL) support for GLM 5.1 via its Training API. This update builds on the platform's GLM 5.1 fine-tuning support by using LoRA (an efficient adaptation technique) to perform RL on a 200K context window.

Context window: 200K tokens
Training methods: SFT, DPO, and LoRA RL
Usage limits: None
Model ownership: Full weight ownership
Access: Training API

Adding RL matters for teams building specialized reasoning agents that fine-tuning alone can't shape. Custom loss functions let teams encode their own domain objectives instead of relying on rigid recipes, turning proprietary data into a differentiated model. This follows the launch of Fireworks AI's Qwen 3.6 fine-tuning and a broader expansion of the Fireworks Training Platform.

You can access these capabilities through the Training API with no usage ceilings. SFT, DPO, and RL run on the same infrastructure that serves Fireworks production inference, so a trained checkpoint becomes a live endpoint without format conversion or stack migration. You retain ownership of the resulting model weights for your own deployment.

View the full update on fireworks.ai

Fireworks AI

@FireworksAI_HQMay 14

Fireworks Training Platform continues to expand. Today GLM 5.1 LoRA RL is now live via Training API: SFT, DPO, and full RL on a 200K context window → custom loss functions or smart defaults. No usage ceilings. No credits to claim. Your model. Your inference. Get started → https://t.co/sBNnhKT5dq

220

View on X

Still wondering? A few quick answers below.

GLM 5.1 is a large language model developed by Z.ai. It is available on the Fireworks AI platform for customization, allowing developers to use reinforcement learning to adapt the model to their own data and reward signals. The Fireworks Training Platform handles the infrastructure so teams can focus on the training objective rather than GPU orchestration.

The Fireworks Training Platform supports reinforcement learning for GLM 5.1 using Low-Rank Adaptation, or LoRA, which is an efficient method for fine-tuning large models. Developers can use the Training API to implement supervised fine-tuning, direct preference optimization, or full reinforcement learning. The platform allows for custom loss functions or the use of smart defaults during the training process.

Fireworks AI provides access to GLM 5.1 training with no usage ceilings or credit-claiming requirements. This means developers can train and deploy their personalized models without hitting the restrictive caps often found on closed-source frontier platforms. Once training is complete, the custom model weights are available for inference on the same Fireworks infrastructure.

Training for GLM 5.1 on the Fireworks platform supports a 200K token context window. This large window allows the model to process and learn from extensive datasets, such as long technical documents or codebases, during the reinforcement learning and fine-tuning phases. This capability is useful for tasks that need coherence across substantial amounts of context.

Yes, the Fireworks Training Platform is designed so that you own your model and your inference. After using the Training API to customize GLM 5.1 with reinforcement learning or fine-tuning, the resulting model is yours to use. You can run inference on your personalized model directly through the Fireworks cloud, which is optimized for fast and reliable generative AI performance.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Fireworks AI →

Keep reading

Fireworks AI Adds GLM 5.1 Training to Build Long Horizon Coding Agents

Fireworks AI added Z.ai's GLM 5.1 to its training platform, supporting supervised fine-tuning and direct preference optimization with a 200K context window. This allows developers to customize the flagship agentic model for multi-hour autonomous tasks without the numerical drift common in fragmented training and inference stacks.

Z.ai Launches GLM-5-Turbo, Optimized for Agent Tasks from the Training Phase

Zhipu AIMar 16

Z.ai Launches GLM-5-Turbo, Optimized for Agent Tasks from the Training Phase

Z.ai released GLM-5-Turbo, a fast GLM-5 variant optimized for agent scenarios since the training phase. It targets what agent deployments need: reliable tool calls, complex instruction decomposition, and stable execution through persistent tasks. Available via the Z.ai API and OpenRouter.

What is GLM 5.1?

How does reinforcement learning work on the Fireworks Training Platform?

What are the usage limits for GLM 5.1 training on Fireworks AI?

What is the context window for GLM 5.1 training?

Can I deploy my own model weights after training on Fireworks?

Keep reading

Fireworks AI Adds GLM 5.1 Training to Build Long Horizon Coding Agents

Fireworks AI Adds GLM 5.1 Training to Build Long Horizon Coding Agents

Z.ai Launches GLM-5-Turbo, Optimized for Agent Tasks from the Training Phase

Z.ai Launches GLM-5-Turbo, Optimized for Agent Tasks from the Training Phase

Keep reading

Fireworks AI Adds GLM 5.1 Training to Build Long Horizon Coding Agents

Fireworks AI Adds GLM 5.1 Training to Build Long Horizon Coding Agents

Z.ai Launches GLM-5-Turbo, Optimized for Agent Tasks from the Training Phase

Z.ai Launches GLM-5-Turbo, Optimized for Agent Tasks from the Training Phase