Fireworks AI Adds Gemma 4 Training to Build Custom Reasoning Agents

Q: What is the Fireworks AI Training Platform?

It is a unified infrastructure for training and deploying custom AI models. It supports full-parameter training, which modifies every part of a model, for systems ranging from small versions to trillion-parameter models. The platform is designed to ensure that a model's behavior during training matches its production performance exactly.

Q: Which Gemma 4 models are available on Fireworks AI?

Fireworks AI supports the 26B and 31B variants of Google DeepMind's Gemma 4 family. The 26B model uses a Mixture-of-Experts architecture with specialized sub-networks, while the 31B is a dense model. Both versions feature a 256K context window and are available for supervised fine-tuning and direct preference optimization.

Q: How does Fireworks AI ensure training-inference parity?

The platform uses the same kernels and hardware for both training and inference, preventing numerical drift between the two stages. Fireworks publishes KL divergence values, which are statistical measures of how much two models differ, to verify consistency. This ensures that the quality measured during evaluation remains identical when deployed.

Q: What is the pricing for training models on Fireworks AI?

Pricing depends on the chosen entry point. Managed training for supervised fine-tuning or direct preference optimization is billed per token or per GPU-hour. The Training API, which allows for custom loss functions and large-scale reinforcement learning, uses a predictable per-GPU-hour pricing model to accommodate complex research and rollout-heavy workflows.

Q: Can I use custom loss functions on the Fireworks Training Platform?

Yes, the Training API allows researchers and advanced teams to bring their own training loops and write custom loss functions. This supports specific mathematical objectives like GRPO or DRO without being restricted to pre-defined recipes. It also enables chaining different training stages while preserving the full optimizer state between them.

Fireworks AI

Apr 28, 2026

Fireworks AI integrated Google's Gemma 4 models into its training platform, enabling full-parameter fine-tuning and DPO with a 256K context window. This allows teams to build specialized reasoning agents on a unified stack that transitions from training to production inference in seconds.

Fireworks AI, an inference platform for fast model serving, added Google's Gemma 4 models to its training platform. The update supports 26B Mixture-of-Experts (a model using specialized sub-networks) and 31B dense variants. Users can now perform supervised fine-tuning (SFT) and direct preference optimization (DPO, a method for aligning model behavior).

Model variants: 26B and 31B
Context window: 256K tokens
Training methods: SFT, DPO, Full-parameter
Max model scale: 1T parameters
Numerical parity: < 0.01 KL divergence
Pricing: Per token or GPU-hour

This integration follows Fireworks AI's Day-0 support for Kimi K2.6 and mirrors the shift toward owning specialized open-weight models. By providing infrastructure for training trillion-parameter models, Fireworks removes the "migration tax" where models behave differently in training than in production. Gemma 4's 256K context window suits complex agentic tasks.

You can start training via the Training Agent for automated pipelines or the Training API for custom loss functions. The platform ensures training-inference parity, so evaluated checkpoints match production performance exactly. Managed training is priced per token; the Training API uses predictable per-GPU-hour pricing.

View the full update on fireworks.ai

Fireworks AI

@FireworksAI_HQApr 27

Gemma 4 (26B + 31B) from @GoogleDeepMind is now available on the Fireworks Training Platform across the Managed and Training API workflows. Try SFT and DPO with smart defaults or your own custom loss function with a 256K context window. RL support landing soon! What would you like to see next? https://t.co/rqSamw3I3e

238

View on X

Still wondering? A few quick answers below.

It is a unified infrastructure for training and deploying custom AI models. It supports full-parameter training, which modifies every part of a model, for systems ranging from small versions to trillion-parameter models. The platform is designed to ensure that a model's behavior during training matches its production performance exactly.

Fireworks AI supports the 26B and 31B variants of Google DeepMind's Gemma 4 family. The 26B model uses a Mixture-of-Experts architecture with specialized sub-networks, while the 31B is a dense model. Both versions feature a 256K context window and are available for supervised fine-tuning and direct preference optimization.

The platform uses the same kernels and hardware for both training and inference, preventing numerical drift between the two stages. Fireworks publishes KL divergence values, which are statistical measures of how much two models differ, to verify consistency. This ensures that the quality measured during evaluation remains identical when deployed.

Pricing depends on the chosen entry point. Managed training for supervised fine-tuning or direct preference optimization is billed per token or per GPU-hour. The Training API, which allows for custom loss functions and large-scale reinforcement learning, uses a predictable per-GPU-hour pricing model to accommodate complex research and rollout-heavy workflows.

Yes, the Training API allows researchers and advanced teams to bring their own training loops and write custom loss functions. This supports specific mathematical objectives like GRPO or DRO without being restricted to pre-defined recipes. It also enables chaining different training stages while preserving the full optimizer state between them.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Fireworks AI →

Keep reading

Fireworks AI Adds RL for Gemma 4 Dense to Build Reasoning Agents

Fireworks AI expanded its training platform to support full-parameter and LoRA-based reinforcement learning for Google's Gemma 4 Dense model. This allows developers to perform SFT, DPO, or RL on the model's full 256K context window using a unified stack that eliminates numerical drift between training and production.

GoogleApr 27

Google Launches Gemma 4 to Bring Frontier Reasoning to Local Devices

Google released Gemma 4, a new family of open models built on the same architecture as Gemini 3 and licensed under Apache 2.0. These models deliver high-performance reasoning and native multimodal capabilities directly on consumer hardware, enabling private, offline agentic workflows. This shift allows developers to build sophisticated AI applications that run entirely on-device without sacrificing intelligence.

Vercel brings Google Gemma 4 to AI Gateway for high-performance agentic workflows

VercelApr 2

Vercel brings Google Gemma 4 to AI Gateway for high-performance agentic workflows

Vercel now supports Google's Gemma 4 models on its AI Gateway, offering native function calling and structured JSON output for building autonomous agents. These 26B and 31B models feature a 256K context window and are built on the same architecture as Gemini 3. This integration allows developers to deploy high-performance open models with enterprise-grade reliability and no price markup.

Google Releases Gemma 4 Drafter Models to Accelerate Local Inference Speed

Google GemmaMay 5

Google Releases Gemma 4 Drafter Models to Accelerate Local Inference Speed

Google released a series of specialized drafter models that use speculative decoding to significantly increase the inference speed of the Gemma 4 family. By integrating architectural optimizations like shared activations and KV caches, these tiny models allow larger target models to verify multiple tokens in a single parallel pass.