Fireworks AI

Fireworks AI AI News & Updates

The latest AI news and updates of Fireworks AI — AI inference platform for fast, customizable model serving and compound AI systems at scale. Covering Fireworks AI's latest product updates, analysis, and company news from the past 90 days.

Fireworks AIFireworks AI14h ago

Fireworks AI Adds Qwen 3.7 Plus With Agentic Reasoning and Caching

Fireworks AI now serves Qwen 3.7 Plus as a direct inference provider, offering full control over latency and data paths. The model supports thinking and non-thinking modes, preserved reasoning history, and prompt caching by default. It is available on serverless endpoints compatible with OpenAI and Anthropic APIs, priced at 0.50 dollars per million input tokens.

Read more
Fireworks AIFireworks AI15h ago

Fireworks AI Adds Day-0 Support for MiniMax M3 Multimodal Model

Fireworks AI launched day-0 support for MiniMax M3, a multimodal model featuring native image and video input. Powered by MiniMax Sparse Attention, the model delivers 9× faster prefill and 15× faster decode speeds. It is available now on serverless and on-demand endpoints, priced at parity with M2.7 at $0.30 per million input tokens.

Read more

Fireworks AI Adds NVIDIA Nemotron 3 Ultra for Agentic Reasoning

Fireworks AI now offers NVIDIA Nemotron 3 Ultra, an open model for advanced autonomous agents, with immediate deployment support. This provides developers with optimized infrastructure for long-running agentic tasks that require frontier reasoning and orchestration.

Read more

Fireworks AI adds Step 3.7 Flash for high speed agentic reasoning

Fireworks AI has deployed Step 3.7 Flash, a 198B-parameter vision-language model designed for rapid inference. The model enables real-time agentic workflows by delivering up to 400 tokens per second with selectable reasoning depths.

Read more

Fireworks AI hosts MiniMax M3 with 15x faster long context decoding

Fireworks AI is now powering inference for MiniMax M3, a multimodal model featuring a novel sparse attention architecture. The partnership enables 15.6x faster decoding at 1-million-token context, making real-time agentic workflows viable at scale.

Read more

Fireworks AI Research Shows Hybrid Agents Outperform Monolithic Frontier Models

Fireworks AI demonstrated that a GLM 5.1 worker using Claude Opus 4.7 as a sparse advisor beats standalone Opus on legal benchmarks. This architectural shift achieves higher accuracy on complex tasks while reducing inference costs by over 60%.

Read more

Microsoft brings MAI reasoning models to Fireworks for enterprise fine-tuning

Microsoft AI is launching a family of seven in-house models and partnering with Fireworks AI to enable weight-level customization. This allows organizations to integrate proprietary institutional knowledge directly into Microsoft's frontier reasoning models.

Read more
Fireworks AIFireworks AIMay 30

Fireworks AI Serverless 2.0 Adds Priority Lanes Without Reserved GPUs

Fireworks AI launched Serverless 2.0, introducing three distinct serving paths—Standard, Priority, and Fast—to its inference platform. By allowing developers to choose between cost-efficiency, congestion reliability, or high throughput at the request level, the update removes the binary choice between shared fleets and expensive reserved capacity.

Read more
Fireworks AIFireworks AIMay 29

Fireworks AI earns NVIDIA CEO Jensen Huang endorsement as AI foundry

NVIDIA CEO Jensen Huang characterized Fireworks AI as the TSMC of AI factories, highlighting the company's specialized role in the inference stack. This endorsement signals a shift where high-performance inference providers are becoming the essential foundries for the generative AI era.

Read more
Fireworks AIFireworks AIMay 29

Ramp Labs Deploys 10,000 Agents on Fireworks AI to Slash Security Costs

Ramp Labs used a fleet of 10,000 autonomous agents powered by open-weight models to identify high-severity vulnerabilities in its production backend. The deployment achieved a fivefold reduction in token costs compared to GPT 5.5 while maintaining the reasoning depth required for complex security auditing.

Fireworks AIFireworks AIMay 27

Cursor and Fireworks AI Detail the Specialized Training Infrastructure Behind Composer 2.5

Cursor and Fireworks AI shared a technical breakdown of the distributed reinforcement learning infrastructure used to build the Composer 2.5 coding model. The team treats model weights as finite storage bits dedicated entirely to software engineering, allowing the model to match frontier performance at one-tenth the cost. This shift demonstrates how specialized products can use real-world usage as a proprietary training loop.

Read more
Fireworks AIFireworks AIMay 20

Fireworks AI Benchmark Reveals Hidden Execution Tax Draining Agent Budgets

Fireworks AI released a benchmark report on browser agents revealing that malformed outputs create a hidden execution tax that inflates production costs. The study found that reliability in multi-step loops matters more than raw intelligence, with some frontier models wasting nearly a quarter of their inference budget on retries.

Read more
Fireworks AIFireworks AIMay 16

Fireworks AI Adds RL for Gemma 4 Dense to Build Reasoning Agents

Fireworks AI expanded its training platform to support full-parameter and LoRA-based reinforcement learning for Google's Gemma 4 Dense model. This allows developers to perform SFT, DPO, or RL on the model's full 256K context window using a unified stack that eliminates numerical drift between training and production.

Read more
Fireworks AIFireworks AIMay 15

Fireworks AI Adds Managed Fine-Tuning for Qwen 3.6 27B

Fireworks AI launched managed fine-tuning for Alibaba's Qwen 3.6 27B model, supporting 256K context windows and out-of-the-box DPO. This allows developers to specialize a high-performance dense model for complex coding and reasoning tasks on a production-ready stack.

Read more
Fireworks AIFireworks AIMay 15

Fireworks AI Expands Azure Foundry Catalog With Frontier Reasoning Models

Fireworks AI added DeepSeek V4 Pro and Kimi K2.6 to Microsoft Azure AI Foundry while expanding provisioned throughput support to the US Data Zone. The update allows enterprise teams to run high-performance open models with guaranteed throughput and data residency within their existing Azure environment.

Read more
Fireworks AIFireworks AIMay 15

Fireworks AI Adds Reinforcement Learning for GLM 5.1 to Build Custom Agents

Fireworks AI expanded its training platform to support LoRA-based reinforcement learning for Z.ai's GLM 5.1 model. This allows developers to align the model's reasoning steps with specific domain logic using custom loss functions on a 200K context window.

Read more
Fireworks AIFireworks AIMay 13

Fireworks AI Launches Full Parameter RL Training for Kimi K2.6

Fireworks AI added full-parameter reinforcement learning support for Moonshot AI's 1-trillion parameter Kimi K2.6 model. This allows developers to tune the entire model weight set on proprietary data to build specialized agentic moats that outperform off-the-shelf frontier systems.

Read more

Fireworks AI Uses Delta Compression to Reduce Frontier RL Training Costs

Fireworks AI introduced a distributed reinforcement learning architecture that uses delta-compressed weight updates to sync training and inference clusters across different regions. By shipping only the 2% of weights that change between checkpoints, teams can train frontier-scale models using fragmented GPU capacity instead of expensive mega-clusters.

Fireworks AIFireworks AIApr 30

Fireworks AI Adds Safe Tokenization to Stop Users Overriding System Prompts

Fireworks AI introduced an opt-in safe_tokenization flag that prevents user input from being parsed as model control tokens. This update addresses a fundamental security flaw in open-weights inference where malicious text can forge turn boundaries to bypass system instructions. By separating user content from structural code at the tokenizer level, developers can ensure their core product logic remains authoritative.

Fireworks AIFireworks AIApr 30

Fireworks AI Adds Qwen 3.5 Training to Build Custom Reasoning Agents

Fireworks AI integrated Alibaba's Qwen 3.5 into its training platform, supporting full-parameter fine-tuning and reinforcement learning with a 256K context window. This allows developers to customize the high-performance open-weight model for specialized reasoning and coding tasks on a unified stack.

Fireworks AIFireworks AIApr 28

Fireworks AI Adds Gemma 4 Training to Build Custom Reasoning Agents

Fireworks AI integrated Google's Gemma 4 models into its training platform, enabling full-parameter fine-tuning and DPO with a 256K context window. This allows teams to build specialized reasoning agents on a unified stack that transitions from training to production inference in seconds.

Fireworks AIFireworks AIApr 28

Fireworks AI Adds GLM 5.1 Training to Build Long Horizon Coding Agents

Fireworks AI added Z.ai's GLM 5.1 to its training platform, supporting supervised fine-tuning and direct preference optimization with a 200K context window. This allows developers to customize the flagship agentic model for multi-hour autonomous tasks without the numerical drift common in fragmented training and inference stacks.

Fireworks AIFireworks AIApr 27

Fireworks AI Hosts DeepSeek V4 Pro With 1M Context and MIT License

Fireworks AI added DeepSeek V4-Pro to its inference platform, offering frontier-level coding and reasoning via an open-weight model. The deployment standardizes a 1-million token context window at a price point significantly lower than closed-source competitors.

Fireworks AIFireworks AIApr 25

Fireworks AI Adds Kimi K2.6 Training to Build Custom Frontier Agents

Fireworks AI added Moonshot AI's Kimi K2.6 to its training platform, enabling supervised fine-tuning and reinforcement learning on the 1-trillion parameter model. This allows teams to customize the leading open-weight agentic model for specific production workflows while maintaining a 265K context window.

Fireworks AIFireworks AIApr 21

Fireworks AI Launches Day-0 Support for Kimi K2.6 Agentic Model

Fireworks AI added immediate support for Kimi K2.6, a 1-trillion parameter multimodal model optimized for long-horizon agentic coding. The update provides the high-speed inference and fine-tuning infrastructure needed to run the successor to the model that powered Cursor's Composer 2.

Fireworks AI Launches Training Platform to Fine-Tune Frontier Models at Scale

Fireworks AI released a training platform in preview that supports full-parameter fine-tuning for models ranging from 8B to 1T parameters. This allows teams to move beyond prompt engineering by using reinforcement learning to build proprietary models that outperform closed frontier systems on specific tasks.

Fireworks AI Launches Infrastructure for Training Trillion Parameter MoE Models

Fireworks AI released a major update to its Training SDK featuring Blackwell-native kernels and 4D parallelism for trillion-parameter Mixture-of-Experts models. By fusing reinforcement learning losses and optimizing for asynchronous data, the platform enables frontier-grade model training that was previously restricted to elite research labs.

Fireworks AIFireworks AIMar 28

Fireworks AI Powers Cursor Composer 2 With Distributed Global RL Infrastructure

Fireworks AI revealed the infrastructure behind Cursor's Composer 2, using disaggregated sampling to run RL across multiple global clusters. By shipping only 2% of model weights as compressed deltas, they eliminated the need for a single massive mega-cluster. This shift makes frontier-scale RL training economically viable using fragmented, multi-region GPU capacity.

Frequently asked questions

Fireworks AI is AI inference platform for fast, customizable model serving and compound AI systems at scale. HeadsUpAI tracks Fireworks AI across the AI ecosystem and curates every significant update — the latest being "Fireworks AI Adds Qwen 3.7 Plus With Agentic Reasoning and Caching" (June 13, 2026) — so you get the whole story in a 30-second read.

The most recent Fireworks AI update is "Fireworks AI Adds Qwen 3.7 Plus With Agentic Reasoning and Caching" (June 13, 2026). HeadsUpAI curates every significant Fireworks AI release as a 30-second read — what shipped and why it matters.

The latest Fireworks AI updates: "Fireworks AI Adds Qwen 3.7 Plus With Agentic Reasoning and Caching", "Fireworks AI Adds Day-0 Support for MiniMax M3 Multimodal Model", "Fireworks AI Adds NVIDIA Nemotron 3 Ultra for Agentic Reasoning", "Fireworks AI adds Step 3.7 Flash for high speed agentic reasoning", and "Fireworks AI hosts MiniMax M3 with 15x faster long context decoding". HeadsUpAI has curated 28 Fireworks AI updates over the last 90 days, covering product updates, analysis, and company news — listed newest first, presented straight, no hype, no bias.

Fireworks AI is AI inference platform for fast, customizable model serving and compound AI systems at scale. On this page you'll find every significant Fireworks AI development HeadsUpAI has tracked recently — product updates, analysis, and company news — so you can keep up with where Fireworks AI is heading without reading a dozen sources.

Continuously. HeadsUpAI adds new Fireworks AI updates as they're announced — usually within hours — and the 28 updates currently shown cover the past 90 days, newest first.