Kimi K2.5 is now available on #WorkersAI. You can now build and run agents end-to-end on the Cloudflare Developer Platform. Read about how we tuned our inference stack to drive down costs for internal agent workflows. https://t.co/kEQ6HHpoJS
Cloudflare Workers AI Adds Kimi K2.5 for End-to-End Agent Workflows
· Updated
Cloudflare Workers AI now runs Kimi K2.5, Moonshot AI's frontier open-source model with a 256k context window, multi-turn tool calling, vision inputs, and structured outputs. Workers AI previously focused on smaller models — this marks their entry into large-model inference. Cloudflare built custom kernels on their Infire inference engine, handling the tensor and parallelization optimizations you'd otherwise need a dedicated ML team to manage.
Cloudflare's internal security review agent, processing over 7 billion tokens per day, ran 77% cheaper after switching to Kimi K2.5 compared to a mid-tier proprietary model. As agentic workloads scale, open-source frontier models with this price-performance ratio become the practical path for enterprises running high-volume inference.
The model is available as @cf/moonshotai/kimi-k2.5 on Workers AI. A new x-session-affinity header improves prefix cache hit rates for multi-turn sessions, and a revamped async API handles batch inference without capacity errors.
Cloudflare
@Cloudflare
5retweets
View on X




