Fireworks AI Adds RL for Gemma 4 Dense to Build Reasoning Agents

Fireworks AI

May 16, 2026 · Updated Jun 12, 2026

Fireworks AI expanded its training platform to support full-parameter and LoRA-based reinforcement learning for Google's Gemma 4 Dense model. This allows developers to perform SFT, DPO, or RL on the model's full 256K context window using a unified stack that eliminates numerical drift between training and production.

Fireworks AI, an inference platform for fast model serving, added reinforcement learning (RL, a method for aligning models with human preferences) support for Gemma 4 Dense. This update enables full-parameter training and LoRA (a memory-efficient method using small adapters) on the 256K context window, following earlier Fireworks AI's Gemma 4 integration.

Context window: 256K tokens
Training methods: SFT, DPO, and RL
Training modes: Full-parameter and LoRA
Max model scale: 1T parameters
Availability: Fireworks Training Platform Preview

This addresses the "vibe coding" bottleneck where generic models fail on specific domain logic. By providing a unified infrastructure, Fireworks prevents numerical drift—the gap in model behavior between training and production stacks. This mirrors recent expansions for Fireworks AI's Kimi K2.6 RL support and Fireworks AI's GLM 5.1 RL support.

Use the Training API to write custom loss functions or managed workflows to specialize Gemma 4 for long-context reasoning. The platform supports elastic RL rollouts across regions, allowing you to scale training without managing GPU clusters. Trained checkpoints can be hot-loaded into production endpoints in seconds.

View the full update on fireworks.ai

Fireworks AI

@FireworksAI_HQMay 15

Weekends are for vibe coding. But are your vibes continuously improving? Fine-tune your own model → stop waiting on someone else's release cycle. Today's training update: Gemma 4 Dense is now available for Full Param + LoRA RL. SFT, DPO, or RL on 256K context. Get started with the Fireworks Training Platform. https://t.co/rqSamw3I3e

View on X

Still wondering? A few quick answers below.

Fireworks AI supports training for Gemma 4 Dense with a context window of up to 256K tokens. This allows developers to perform supervised fine-tuning, direct preference optimization, or reinforcement learning on long-context data. The platform ensures that the model behavior during training matches production inference by using the same underlying hardware and kernels.

The Fireworks Training Platform supports full-parameter training for models ranging from small dense versions up to trillion-parameter systems like Kimi K2.5. Unlike adapter-based methods like LoRA, full-parameter training allows for deeper behavioral changes. Fireworks manages the distributed systems complexity, including composable parallelism and precision tuning, to support these large-scale training runs.

The Training Agent is an autonomous tool for product teams that handles data cleaning, model selection, and deployment based on a task description. Managed Training is designed for machine learning engineers who want to pick specific methods like SFT or DPO while Fireworks handles GPU provisioning and scaling. The Agent is currently limited to LoRA-based training.

The Fireworks Training Platform is currently in a preview phase, though it runs on the same infrastructure that serves production traffic for companies like Cursor and Vercel. It is designed to be production-ready by offering one-click deployment where a trained checkpoint becomes a live endpoint in seconds without requiring format conversions or stack migrations.

Yes, the Fireworks Training API allows researchers and advanced teams to bring their own training loops and write custom loss functions. This includes support for objectives like GRPO or DAPO without being restricted to rigid recipes. The platform also enables elastic reinforcement learning rollouts across different regions with weight synchronization for rollout-heavy workflows.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Fireworks AI →

Keep reading

Fireworks AI Adds Gemma 4 Training to Build Custom Reasoning Agents

Fireworks AI integrated Google's Gemma 4 models into its training platform, enabling full-parameter fine-tuning and DPO with a 256K context window. This allows teams to build specialized reasoning agents on a unified stack that transitions from training to production inference in seconds.

GoogleApr 27

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google released Gemma 4, a new family of open models built on the same architecture as Gemini 3 and licensed under Apache 2.0. These models deliver high-performance reasoning and native multimodal capabilities directly on consumer hardware, enabling private, offline agentic workflows. This shift allows developers to build sophisticated AI applications that run entirely on-device without sacrificing intelligence.

Vercel brings Google Gemma 4 to AI Gateway for high-performance agentic workflows

VercelApr 2

Vercel brings Google Gemma 4 to AI Gateway for high-performance agentic workflows

Vercel now supports Google's Gemma 4 models on its AI Gateway, offering native function calling and structured JSON output for building autonomous agents. These 26B and 31B models feature a 256K context window and are built on the same architecture as Gemini 3. This integration allows developers to deploy high-performance open models with enterprise-grade reliability and no price markup.

What is the context window for Gemma 4 Dense training on Fireworks AI?

How does Fireworks AI handle full-parameter training for large models?

What is the difference between the Fireworks Training Agent and Managed Training?

Is the Fireworks Training Platform available for production use?

Does Fireworks AI support custom reinforcement learning objectives?

Keep reading

Fireworks AI Adds Gemma 4 Training to Build Custom Reasoning Agents

Fireworks AI Adds Gemma 4 Training to Build Custom Reasoning Agents

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Vercel brings Google Gemma 4 to AI Gateway for high-performance agentic workflows

Vercel brings Google Gemma 4 to AI Gateway for high-performance agentic workflows

Keep reading

Fireworks AI Adds Gemma 4 Training to Build Custom Reasoning Agents

Fireworks AI Adds Gemma 4 Training to Build Custom Reasoning Agents

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Vercel brings Google Gemma 4 to AI Gateway for high-performance agentic workflows

Vercel brings Google Gemma 4 to AI Gateway for high-performance agentic workflows