Qwen 3.5 Vision Models Now Runnable Locally via Ollama

Qwen

Feb 25, 2026 · Updated Apr 25, 2026

Qwen 3.5 vision models are now available locally via Ollama, with the 35B fitting on a 24GB GPU. All three models include built-in vision, expanded language support, and improved efficiency compared to previous Qwen releases.

Ollama added the Qwen 3.5 family to its local model library. The 35B model (MoE architecture) runs on systems with 24GB+ GPU memory — ollama run qwen3.5:35b. The 122B variant follows for higher-end setups, while the 397B remains cloud-only. All three include vision built-in, replacing the separate VL model variants from Qwen 2.5.

The shift to native vision is significant for local AI workflows - you no longer need a separate vision model or a cloud API to handle images and documents alongside text. Qwen 3.5 also adds broader multilingual support and more efficient inference, making it a direct upgrade path for anyone running Qwen locally.

For 24GB GPU owners, ollama run qwen3.5:35b is the fastest way to try it. The 122B model is available for workstations with more VRAM.

View the full update on ollama.com

ollama

@ollamaFeb 25

Qwen 3.5 family is here! > vision built-in, and can outperform previous VL models > designed to be more efficient > expanded support for more languages 35B: (fits on 24GB+ system) ollama run qwen3.5:35b 122B: ollama run qwen3.5:122b 397B (cloud only): ollama run https://t.co/JIobYmBgCj

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Qwen →

Keep reading

Qwen3.5 Small Series Brings Native Multimodal AI to Edge Devices

Qwen released four compact multimodal models - 0.8B, 2B, 4B, and 9B - with native vision and scaled RL. The 9B outperforms models 13x larger on graduate reasoning, making capable multimodal AI viable on edge devices and in lightweight agents.

Ollama Launches Qwen 3.6 27B with Native Support for Agentic Coding Tools

OllamaApr 24

Ollama Launches Qwen 3.6 27B with Native Support for Agentic Coding Tools

Ollama added the Qwen 3.6 27B model to its library, enabling local execution of the latest open-weight coding model. The update introduces direct integration with agentic frameworks like OpenClaw and Claude Code, allowing developers to run autonomous coding workflows entirely on local hardware.

Georgi Gerganov Recommends Qwen 3.5 to Solve Local Coding Agent Performance Issues

Simon WillisonMar 31

Georgi Gerganov Recommends Qwen 3.5 to Solve Local Coding Agent Performance Issues

Georgi Gerganov identified Qwen 3.5 as a major advancement for local coding tasks across various hardware sizes. He noted that disappointing performance in local agents often stems from the software harness and prompt construction rather than the model itself. This highlights the need for precise integration to match frontier-level agentic capabilities.

ASI:Cloud Adds MiniMax M2.5, Qwen, and GLM Models for Inference

CUDOSMar 19

ASI:Cloud Adds MiniMax M2.5, Qwen, and GLM Models for Inference

ASI:Cloud, a serverless AI inference platform, added three new models for immediate use: MiniMax M2.5, Qwen 3.5-35B-A3B, and GLM 4.7 Flash. No waitlists — all three are live now via its OpenAI-compatible API.