Fireworks AI Adds Qwen 3.7 Plus With Agentic Reasoning and Caching

Fireworks AI

Jun 13, 2026

Fireworks AI now serves Qwen 3.7 Plus as a direct inference provider, offering full control over latency and data paths. The model supports thinking and non-thinking modes, preserved reasoning history, and prompt caching by default. It is available on serverless endpoints compatible with OpenAI and Anthropic APIs, priced at 0.50 dollars per million input tokens.

Terminal-Bench 2.0 (Terminus-2) Agentic Terminal Coding; SWE-bench Multilingual Agentic Software Engineering; SWE-bench Pro Agentic Software Engineering; NL2Repo Long-Horizon Coding; HLE Humanity's Last Exam; MCP-Mark Realistic MCP Use; ClawEval Real-World Agent; BFCLv4 Agent Tool Calling; BabyVision Visual Understanding; ScreenSpot Pro Screen Understanding; MMBC Multimodal Benchmark; RealWorldQA Real-World Visual QA; Qwen3.7-Plus; Qwen3.6-Plus; DeepSeek-V4-Pro; GLM-5.1; Kimi-K2.6; Claude-Opus-4.6; GPT-5.4; Gemini-3.1-Pro — Model performance benchmarks across coding, agentic, and multimodal tasks comparing Qwen, DeepSeek, GLM, Kimi, Claude, GPT, and Gemini.

View the full update on fireworks.ai

Fireworks AI

@FireworksAI_HQ1d ago

Qwen 3.7 Plus is now live on Fireworks. You get the official weights running on our stack. That means full control of latency, throughput, and data path end-to-end, with zero data retention and our 99.9% SLA. Let’s dig in ↓ https://t.co/4JAmGyj9PE

250

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Keep reading

Qwen Partners with Fireworks AI for Global Access to Qwen 3.6 Plus

Alibaba's Qwen team partnered with Fireworks AI to provide production-ready access to its closed-weights Qwen 3.6 Plus model. This move gives global developers a low-latency, cost-effective way to run Alibaba's flagship intelligence without using Chinese cloud infrastructure.

Fireworks AI Adds Managed Fine-Tuning for Qwen 3.6 27B

Fireworks AIMay 15

Fireworks AI Adds Managed Fine-Tuning for Qwen 3.6 27B

Fireworks AI launched managed fine-tuning for Alibaba's Qwen 3.6 27B model, supporting 256K context windows and out-of-the-box DPO. This allows developers to specialize a high-performance dense model for complex coding and reasoning tasks on a production-ready stack.

Nous Research Adds Qwen 3.7 Max Support to Hermes Agent

Nous ResearchMay 27

Nous Research Adds Qwen 3.7 Max Support to Hermes Agent

Nous Research integrated Alibaba's Qwen 3.7 Max into its open-source Hermes Agent platform. This allows users to power autonomous multi-step workflows with reasoning models while benefiting from recent cost-saving context caching.

OpenRouter Adds Qwen3.7-Max for Long Horizon Agentic Coding and Office Tasks

OpenRouterMay 21

OpenRouter Adds Qwen3.7-Max for Long Horizon Agentic Coding and Office Tasks

OpenRouter integrated Alibaba's Qwen3.7-Max, a flagship model optimized for autonomous agent loops and multi-hour task execution. The update introduces explicit prompt caching for the Qwen series, allowing developers to maintain massive context windows at a 90 percent discount on subsequent requests.