Gemma 4 (26B + 31B) from @GoogleDeepMind is now available on the Fireworks Training Platform across the Managed and Training API workflows. Try SFT and DPO with smart defaults or your own custom loss function with a 256K context window. RL support landing soon! What would you like to see next? https://t.co/rqSamw3I3e
Fireworks AI Adds Gemma 4 Training to Build Custom Reasoning Agents
Fireworks AI, an inference platform for fast model serving, added Google's Gemma 4 models to its training platform. The update supports 26B Mixture-of-Experts (a model using specialized sub-networks) and 31B dense variants. Users can now perform supervised fine-tuning (SFT) and direct preference optimization (DPO, a method for aligning model behavior).
- Model variants
- 26B and 31B
- Context window
- 256K tokens
- Training methods
- SFT, DPO, Full-parameter
- Max model scale
- 1T parameters
- Numerical parity
- < 0.01 KL divergence
- Pricing
- Per token or GPU-hour
This integration follows Fireworks AI's Day-0 support for Kimi K2.6 and mirrors the shift toward owning specialized open-weight models. By providing infrastructure for training trillion-parameter models, Fireworks removes the "migration tax" where models behave differently in training than in production. Gemma 4's 256K context window suits complex agentic tasks.
You can start training via the Training Agent for automated pipelines or the Training API for custom loss functions. The platform ensures training-inference parity, so evaluated checkpoints match production performance exactly. Managed training is priced per token; the Training API uses predictable per-GPU-hour pricing.
Fireworks AI
@FireworksAI_HQ
2retweets38likes
View on XStill wondering? A few quick answers below.
It is a unified infrastructure for training and deploying custom AI models. It supports full-parameter training, which modifies every part of a model, for systems ranging from small versions to trillion-parameter models. The platform is designed to ensure that a model's behavior during training matches its production performance exactly.
Fireworks AI supports the 26B and 31B variants of Google DeepMind's Gemma 4 family. The 26B model uses a Mixture-of-Experts architecture with specialized sub-networks, while the 31B is a dense model. Both versions feature a 256K context window and are available for supervised fine-tuning and direct preference optimization.
The platform uses the same kernels and hardware for both training and inference, preventing numerical drift between the two stages. Fireworks publishes KL divergence values, which are statistical measures of how much two models differ, to verify consistency. This ensures that the quality measured during evaluation remains identical when deployed.
Pricing depends on the chosen entry point. Managed training for supervised fine-tuning or direct preference optimization is billed per token or per GPU-hour. The Training API, which allows for custom loss functions and large-scale reinforcement learning, uses a predictable per-GPU-hour pricing model to accommodate complex research and rollout-heavy workflows.
Yes, the Training API allows researchers and advanced teams to bring their own training loops and write custom loss functions. This supports specific mathematical objectives like GRPO or DRO without being restricted to pre-defined recipes. It also enables chaining different training stages while preserving the full optimizer state between them.





