OpenRouter Adds Qwen3.7-Max for Long Horizon Agentic Coding and Office Tasks

OpenRouter

May 21, 2026 · Updated Jun 13, 2026

OpenRouter integrated Alibaba's Qwen3.7-Max, a flagship model optimized for autonomous agent loops and multi-hour task execution. The update introduces explicit prompt caching for the Qwen series, allowing developers to maintain massive context windows at a 90 percent discount on subsequent requests.

OpenRouter, a unified API platform for accessing hundreds of language models, integrated Alibaba's flagship Qwen3.7-Max. This model is specifically architected for agentic AI—systems that autonomously plan and execute multi-step tasks—with a focus on coding and long-horizon productivity. It introduces explicit prompt caching to the Qwen series.

Cache read price: 0.1x base input
Cache write price: 1.25x base input
Cache TTL: 5 minutes
Availability: OpenRouter API
Supported models: Qwen3.7-Max, Qwen-Plus, Qwen3.6-Plus, and others

The update addresses the economic bottleneck of autonomous workflows where agents repeatedly process large codebases. By supporting explicit breakpoints, the model allows for a 90% reduction in costs for cached reads. This follows the Alibaba Qwen3.7-Max launch and complements OpenRouter's Long Horizon Primitives.

You can access qwen/qwen3-max via the OpenRouter API using standard cache_control parameters. To maximize cache hits, the platform uses provider sticky routing to keep subsequent requests on the same endpoint. Caching is also available for other models on the platform, with pricing for Qwen set at 1.25x for writes and 0.1x for reads.

View the full update on openrouter.ai

OpenRouter

@OpenRouterMay 21

The new Qwen3.7-Max from @Alibaba_Qwen is live on OpenRouter. The flagship of the Qwen3.7 series, built for agent-centric work: coding, office and productivity tasks, and long-horizon autonomous execution. Big jumps in coding and agent benchmarks over Qwen3.6, with explicit prompt caching for repeated context.

7128

View on X

Still wondering? A few quick answers below.

Qwen3.7-Max is the new flagship model in Alibaba's Qwen3.7 series, specifically designed for agent-centric workloads. It is architected to handle complex autonomous tasks such as multi-file coding, office productivity, and long-horizon execution. The model shows significant performance improvements in coding and agent benchmarks compared to the previous Qwen3.6 version.

Prompt caching allows users to store large blocks of text, such as codebases or documentation, to be reused in subsequent requests. For Qwen models, this requires adding explicit cache breakpoints to specific content blocks. OpenRouter then uses sticky routing to ensure subsequent requests are sent to the same provider to maximize cache hits and reduce costs.

Using explicit prompt caching with Qwen3.7-Max on OpenRouter involves two distinct costs. Writing new data to the cache is charged at 1.25 times the standard input token price. However, reading from that cache for subsequent requests is charged at only 0.1 times the original input price, representing a 90 percent discount for repeated context.

Explicit prompt caching is supported on several models in the Qwen family, including Qwen3.7-Max, Qwen-Plus, Qwen3.6-Plus, Qwen3-Coder-Plus, and Qwen3-Coder-Flash. It is also available for DeepSeek-V3.2. However, snapshot endpoints such as Qwen3.5-Plus-02-15 and Qwen3.5-Flash-02-23 do not support this explicit caching feature and cannot use the cache control property.

Provider sticky routing is an optimization that automatically routes subsequent requests to the same provider endpoint after a cache is established. This ensures that the model can access the warm cache to provide the 0.1x read discount. OpenRouter tracks this at the account and conversation level to balance load while maintaining high cache hit rates.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from OpenRouter →

Keep reading

Vercel Integrates Qwen 3.7 Max to Power Autonomous Multi Step Agent Workflows

Vercel added Alibaba's Qwen 3.7 Max to its AI Gateway, enabling developers to access the agent-focused model without separate provider accounts. The model is optimized for long-horizon execution, allowing it to maintain reasoning across complex, multi-step tasks like multi-file engineering and office automation.

Qwen Launches Caching for Qwen3.7-Max to Slash Agent Costs by 90 Percent

QwenMay 25

Qwen Launches Caching for Qwen3.7-Max to Slash Agent Costs by 90 Percent

Qwen introduced implicit and explicit context caching for its flagship Qwen3.7-Max model to reduce API latency and expenses. By allowing developers to pin massive system prompts and tool definitions, the update cuts the cost of repeated inputs by 90 percent.

Nous Research Adds Qwen 3.7 Max Support to Hermes Agent

Nous ResearchMay 27

Nous Research Adds Qwen 3.7 Max Support to Hermes Agent

Nous Research integrated Alibaba's Qwen 3.7 Max into its open-source Hermes Agent platform. This allows users to power autonomous multi-step workflows with reasoning models while benefiting from recent cost-saving context caching.

OpenRouter Adds Gemini 3.5 Flash for High Performance Agentic Coding

OpenRouterMay 19

OpenRouter Adds Gemini 3.5 Flash for High Performance Agentic Coding

OpenRouter integrated Google's Gemini 3.5 Flash, a model that outperforms the previous 3.1 Pro version in coding and tool use at a lower price point. With a 1 million token context window and adjustable thinking levels, it provides a cost-effective alternative for complex autonomous workflows.

What is Qwen3.7-Max?

How does prompt caching work for Qwen models on OpenRouter?

What is the pricing for Qwen3.7-Max prompt caching?

Which Qwen models support explicit prompt caching on OpenRouter?

What is provider sticky routing on OpenRouter?

Keep reading

Vercel Integrates Qwen 3.7 Max to Power Autonomous Multi Step Agent Workflows

Vercel Integrates Qwen 3.7 Max to Power Autonomous Multi Step Agent Workflows

Qwen Launches Caching for Qwen3.7-Max to Slash Agent Costs by 90 Percent

Qwen Launches Caching for Qwen3.7-Max to Slash Agent Costs by 90 Percent

Nous Research Adds Qwen 3.7 Max Support to Hermes Agent

Nous Research Adds Qwen 3.7 Max Support to Hermes Agent

OpenRouter Adds Gemini 3.5 Flash for High Performance Agentic Coding

OpenRouter Adds Gemini 3.5 Flash for High Performance Agentic Coding

Keep reading

Vercel Integrates Qwen 3.7 Max to Power Autonomous Multi Step Agent Workflows

Vercel Integrates Qwen 3.7 Max to Power Autonomous Multi Step Agent Workflows

Qwen Launches Caching for Qwen3.7-Max to Slash Agent Costs by 90 Percent

Qwen Launches Caching for Qwen3.7-Max to Slash Agent Costs by 90 Percent

Nous Research Adds Qwen 3.7 Max Support to Hermes Agent

Nous Research Adds Qwen 3.7 Max Support to Hermes Agent

OpenRouter Adds Gemini 3.5 Flash for High Performance Agentic Coding

OpenRouter Adds Gemini 3.5 Flash for High Performance Agentic Coding