Ramp Labs Deploys 10,000 Agents on Fireworks AI to Slash Security Costs

Fireworks AI

May 29, 2026 · Updated Jun 5, 2026

Ramp Labs used a fleet of 10,000 autonomous agents powered by open-weight models to identify high-severity vulnerabilities in its production backend. The deployment achieved a fivefold reduction in token costs compared to GPT 5.5 while maintaining the reasoning depth required for complex security auditing.

Fireworks AI, an inference platform for fast model serving, shared a production case study from Ramp Labs involving a massive agentic security audit. Ramp deployed 10,000 autonomous agents powered by Kimi K2.6 and DeepSeek V4 Pro to scan its own backend infrastructure for vulnerabilities.

SWE-Bench Verified score: 65.8%
Cost efficiency: ~5x lower than GPT 5.5
Active parameters: 32 billion
Total experts: 384
Availability: Fireworks AI API

The audit recovered seven high-severity security issues at 20% of the cost of using GPT 5.5. This shift follows Warp's open weight model routing launch, proving that specialized open models can handle high-stakes agentic loops (iterative cycles of reasoning and action) that were previously cost-prohibitive on proprietary APIs.

Kimi K2.6 utilizes a Mixture-of-Experts architecture (a design where only specialized sub-networks activate per request) to match frontier performance with higher efficiency. You can access these models via the Fireworks AI API, which recently added DeepSeek V4 Pro support with a 1-million token context window.

View the full update on fireworks.ai

Fireworks AI

@FireworksAI_HQMay 29

Another proof point for the open-weights thesis. From @RampLabs: "If we built this again, we'd lean more on open-weight models." Ramp pointed 10K agents at their own backend. Kimi K2.6 and DeepSeek V4 Pro on Fireworks recovered 7 high-severity vulnerabilities at ~5x lower cost per token than GPT 5.5. In a world of scarce (GPU) resources, both cost and value matter. AI leaders today are finding the right balance btw open & closed. "On balance, the hard cases reward a frontier model, but cheaper open-weight models still find high-severity security issues in production code at meaningful rates."

View on X

Still wondering? A few quick answers below.

Kimi K2 is a Mixture-of-Experts language model designed for autonomous tool use and multi-step reasoning. It features 384 specialized experts, with only 8 activating per token to maintain high efficiency. The model is optimized for agentic workflows, including API orchestration, browser actions, and complex code execution within sandboxed environments.

In specialized software engineering tasks, Kimi K2 achieved a 65.8 percent score on the SWE-Bench Verified benchmark, which is significantly higher than the 54.6 percent score reported for GPT-4.1. It also performs competitively with leading closed-source models like Claude Opus in long-range reasoning and autonomous tool-use scenarios.

Yes, the instruction-tuned version of the model, Kimi-K2-Instruct, is available for deployment through the Fireworks AI inference platform. This variant is optimized with reinforcement learning from human feedback to ensure safe and reliable behavior for chatbots, coding assistants, and autonomous agents using existing LLM pipelines.

Kimi K2 is specifically built for building AI agents that chain thoughts and tools, research workflows requiring retrieval and synthesis, and autonomous software engineering. Its ultra-long context window and Mixture-of-Experts architecture make it ideal for detailed codebase analysis and multi-turn business report synthesis that requires high reasoning depth.

Fireworks AI provides a specialized inference cloud that optimizes the model for speed, quality, and cost. Because Kimi K2 is an open-weight model, the platform allows technical teams to inspect and customize the prompt engineering, tool selection logic, and reward mechanisms while handling the complex infrastructure required for routing.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Fireworks AI →

Keep reading

Fireworks AI Benchmark Reveals Hidden Execution Tax Draining Agent Budgets

Fireworks AI released a benchmark report on browser agents revealing that malformed outputs create a hidden execution tax that inflates production costs. The study found that reliability in multi-step loops matters more than raw intelligence, with some frontier models wasting nearly a quarter of their inference budget on retries.

Ramp Deploys Finance Agents Using Google Managed Agents for Gemini API

GoogleMay 21

Ramp Deploys Finance Agents Using Google Managed Agents for Gemini API

Ramp used the new Managed Agents in the Gemini API to build and deploy advanced financial agents without managing backend infrastructure. This shift allows teams to offload complex agent orchestration and state management to Google, significantly reducing the engineering overhead required for production-grade autonomous workflows.

What is Kimi K2?

How does Kimi K2 performance compare to GPT-4?

Is Kimi K2 available for public use?

What are the main use cases for Kimi K2?

How does Fireworks AI optimize Kimi K2 deployment?

Keep reading

Fireworks AI Benchmark Reveals Hidden Execution Tax Draining Agent Budgets

Fireworks AI Benchmark Reveals Hidden Execution Tax Draining Agent Budgets

Ramp Deploys Finance Agents Using Google Managed Agents for Gemini API

Ramp Deploys Finance Agents Using Google Managed Agents for Gemini API

Keep reading

Fireworks AI Benchmark Reveals Hidden Execution Tax Draining Agent Budgets

Fireworks AI Benchmark Reveals Hidden Execution Tax Draining Agent Budgets

Ramp Deploys Finance Agents Using Google Managed Agents for Gemini API

Ramp Deploys Finance Agents Using Google Managed Agents for Gemini API