Another proof point for the open-weights thesis. From @RampLabs: "If we built this again, we'd lean more on open-weight models." Ramp pointed 10K agents at their own backend. Kimi K2.6 and DeepSeek V4 Pro on Fireworks recovered 7 high-severity vulnerabilities at ~5x lower cost per token than GPT 5.5. In a world of scarce (GPU) resources, both cost and value matter. AI leaders today are finding the right balance btw open & closed. "On balance, the hard cases reward a frontier model, but cheaper open-weight models still find high-severity security issues in production code at meaningful rates."
Ramp Labs Deploys 10,000 Agents on Fireworks AI to Slash Security Costs
Fireworks AI, an inference platform for fast model serving, shared a production case study from Ramp Labs involving a massive agentic security audit. Ramp deployed 10,000 autonomous agents powered by Kimi K2.6 and DeepSeek V4 Pro to scan its own backend infrastructure for vulnerabilities.
- SWE-Bench Verified score
- 65.8%
- Cost efficiency
- ~5x lower than GPT 5.5
- Active parameters
- 32 billion
- Total experts
- 384
- Availability
- Fireworks AI API
The audit recovered seven high-severity security issues at 20% of the cost of using GPT 5.5. This shift follows Warp's open weight model routing launch, proving that specialized open models can handle high-stakes agentic loops (iterative cycles of reasoning and action) that were previously cost-prohibitive on proprietary APIs.
Kimi K2.6 utilizes a Mixture-of-Experts architecture (a design where only specialized sub-networks activate per request) to match frontier performance with higher efficiency. You can access these models via the Fireworks AI API, which recently added DeepSeek V4 Pro support with a 1-million token context window.
Fireworks AI
@FireworksAI_HQ
1retweets8likes
View on XStill wondering? A few quick answers below.
Kimi K2 is a Mixture-of-Experts language model designed for autonomous tool use and multi-step reasoning. It features 384 specialized experts, with only 8 activating per token to maintain high efficiency. The model is optimized for agentic workflows, including API orchestration, browser actions, and complex code execution within sandboxed environments.
In specialized software engineering tasks, Kimi K2 achieved a 65.8 percent score on the SWE-Bench Verified benchmark, which is significantly higher than the 54.6 percent score reported for GPT-4.1. It also performs competitively with leading closed-source models like Claude Opus in long-range reasoning and autonomous tool-use scenarios.
Yes, the instruction-tuned version of the model, Kimi-K2-Instruct, is available for deployment through the Fireworks AI inference platform. This variant is optimized with reinforcement learning from human feedback to ensure safe and reliable behavior for chatbots, coding assistants, and autonomous agents using existing LLM pipelines.
Kimi K2 is specifically built for building AI agents that chain thoughts and tools, research workflows requiring retrieval and synthesis, and autonomous software engineering. Its ultra-long context window and Mixture-of-Experts architecture make it ideal for detailed codebase analysis and multi-turn business report synthesis that requires high reasoning depth.
Fireworks AI provides a specialized inference cloud that optimizes the model for speed, quality, and cost. Because Kimi K2 is an open-weight model, the platform allows technical teams to inspect and customize the prompt engineering, tool selection logic, and reward mechanisms while handling the complex infrastructure required for routing.






