👏👏 Introducing Qwen3.7-Plus — a multimodal agent model that unifies vision and language into one versatile agent foundation. ✅ Multimodal interactive hybrid agent: unified GUI & CLI operation across visual and text tasks ✅ Versatile coding agent & productivity assistant with full-modality input ✅ Visual Agent: perception, reasoning, grounding, and search-augmented QA ✅ Cross-harness generalization across diverse agent frameworks One model. Sees, thinks, codes, acts.🙌🙌 Now available via API on Alibaba Cloud Model Studio. Try it — let us know what you build.😎 🔗🔗⬇️⬇️ Blog:https://t.co/pVYf0h3NNa Qwen Studio:https://t.co/HUYgFW4cYf API:https://t.co/viL0cXrMzW
Alibaba Releases Qwen3.7-Plus to Automate Software Tasks via Unified Vision and Code
- Context Window
- 1,000,000 tokens
- Max Output Tokens
- 65,536
- Input Modalities
- Text, Image, Video
- Framework Support
- Claude Code, OpenClaw, Qwen Code
- Benchmark Performance
- 70.3 on Terminal Bench 2.0
This release makes vision a core component of the agent's reasoning loop. While Qwen3.6-Plus focused on autonomous workflows, qwen3.7-plus demonstrates higher generalization across frameworks. It competes with models like GLM-5V-Turbo by enabling agents to actively build and operate interfaces rather than just understanding them.
Users can access qwen3.7-plus via the Alibaba Cloud Model Studio API, which supports preserve_thinking for multi-turn tasks. The model integrates with Claude Code, OpenClaw, and Qwen Code to automate workflows from frontend prototyping to complex software engineering and multi-step workflow automation.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →





