HeadsUpAI

OpenRouter Adds Qwen3.7-Max for Long Horizon Agentic Coding and Office Tasks

OpenRouter, a unified API platform for accessing hundreds of language models, integrated Alibaba's flagship Qwen3.7-Max. This model is specifically architected for agentic AI—systems that autonomously plan and execute multi-step tasks—with a focus on coding and long-horizon productivity. It introduces explicit prompt caching to the Qwen series.
Cache read price
0.1x base input
Cache write price
1.25x base input
Cache TTL
5 minutes
Availability
OpenRouter API
Supported models
Qwen3.7-Max, Qwen-Plus, Qwen3.6-Plus, and others

The update addresses the economic bottleneck of autonomous workflows where agents repeatedly process large codebases. By supporting explicit breakpoints, the model allows for a 90% reduction in costs for cached reads. This follows the Alibaba Qwen3.7-Max launch and complements OpenRouter's Long Horizon Primitives.

You can access qwen/qwen3-max via the OpenRouter API using standard cache_control parameters. To maximize cache hits, the platform uses provider sticky routing to keep subsequent requests on the same endpoint. Caching is also available for other models on the platform, with pricing for Qwen set at 1.25x for writes and 0.1x for reads.

OpenRouter
OpenRouter
@OpenRouter
X

The new Qwen3.7-Max from @Alibaba_Qwen is live on OpenRouter. The flagship of the Qwen3.7 series, built for agent-centric work: coding, office and productivity tasks, and long-horizon autonomous execution. Big jumps in coding and agent benchmarks over Qwen3.6, with explicit prompt caching for repeated context.

7retweets128likes
View on X

Still wondering? A few quick answers below.

Qwen3.7-Max is the new flagship model in Alibaba's Qwen3.7 series, specifically designed for agent-centric workloads. It is architected to handle complex autonomous tasks such as multi-file coding, office productivity, and long-horizon execution. The model shows significant performance improvements in coding and agent benchmarks compared to the previous Qwen3.6 version.

Prompt caching allows users to store large blocks of text, such as codebases or documentation, to be reused in subsequent requests. For Qwen models, this requires adding explicit cache breakpoints to specific content blocks. OpenRouter then uses sticky routing to ensure subsequent requests are sent to the same provider to maximize cache hits and reduce costs.

Using explicit prompt caching with Qwen3.7-Max on OpenRouter involves two distinct costs. Writing new data to the cache is charged at 1.25 times the standard input token price. However, reading from that cache for subsequent requests is charged at only 0.1 times the original input price, representing a 90 percent discount for repeated context.

Explicit prompt caching is supported on several models in the Qwen family, including Qwen3.7-Max, Qwen-Plus, Qwen3.6-Plus, Qwen3-Coder-Plus, and Qwen3-Coder-Flash. It is also available for DeepSeek-V3.2. However, snapshot endpoints such as Qwen3.5-Plus-02-15 and Qwen3.5-Flash-02-23 do not support this explicit caching feature and cannot use the cache control property.

Provider sticky routing is an optimization that automatically routes subsequent requests to the same provider endpoint after a cache is established. This ensures that the model can access the warm cache to provide the 0.1x read discount. OpenRouter tracks this at the account and conversation level to balance load while maintaining high cache hit rates.

Share this update