Weekends are for vibe coding. But are your vibes continuously improving? Fine-tune your own model → stop waiting on someone else's release cycle. Today's training update: Gemma 4 Dense is now available for Full Param + LoRA RL. SFT, DPO, or RL on 256K context. Get started with the Fireworks Training Platform. https://t.co/rqSamw3I3e
Fireworks AI Adds RL for Gemma 4 Dense to Build Reasoning Agents
Fireworks AI, an inference platform for fast model serving, added reinforcement learning (RL, a method for aligning models with human preferences) support for Gemma 4 Dense. This update enables full-parameter training and LoRA (a memory-efficient method using small adapters) on the 256K context window, following earlier Fireworks AI's Gemma 4 integration.
- Context window
- 256K tokens
- Training methods
- SFT, DPO, and RL
- Training modes
- Full-parameter and LoRA
- Max model scale
- 1T parameters
- Availability
- Fireworks Training Platform Preview
This addresses the "vibe coding" bottleneck where generic models fail on specific domain logic. By providing a unified infrastructure, Fireworks prevents numerical drift—the gap in model behavior between training and production stacks. This mirrors recent expansions for Fireworks AI's Kimi K2.6 RL support and Fireworks AI's GLM 5.1 RL support.
Use the Training API to write custom loss functions or managed workflows to specialize Gemma 4 for long-context reasoning. The platform supports elastic RL rollouts across regions, allowing you to scale training without managing GPU clusters. Trained checkpoints can be hot-loaded into production endpoints in seconds.
Fireworks AI
@FireworksAI_HQ
7likes
View on XStill wondering? A few quick answers below.
Fireworks AI supports training for Gemma 4 Dense with a context window of up to 256K tokens. This allows developers to perform supervised fine-tuning, direct preference optimization, or reinforcement learning on long-context data. The platform ensures that the model behavior during training matches production inference by using the same underlying hardware and kernels.
The Fireworks Training Platform supports full-parameter training for models ranging from small dense versions up to trillion-parameter systems like Kimi K2.5. Unlike adapter-based methods like LoRA, full-parameter training allows for deeper behavioral changes. Fireworks manages the distributed systems complexity, including composable parallelism and precision tuning, to support these large-scale training runs.
The Training Agent is an autonomous tool for product teams that handles data cleaning, model selection, and deployment based on a task description. Managed Training is designed for machine learning engineers who want to pick specific methods like SFT or DPO while Fireworks handles GPU provisioning and scaling. The Agent is currently limited to LoRA-based training.
The Fireworks Training Platform is currently in a preview phase, though it runs on the same infrastructure that serves production traffic for companies like Cursor and Vercel. It is designed to be production-ready by offering one-click deployment where a trained checkpoint becomes a live endpoint in seconds without requiring format conversions or stack migrations.
Yes, the Fireworks Training API allows researchers and advanced teams to bring their own training loops and write custom loss functions. This includes support for objectives like GRPO or DAPO without being restricted to rigid recipes. The platform also enables elastic reinforcement learning rollouts across different regions with weight synchronization for rollout-heavy workflows.





