Qwen3-Coder-Next GGUF Tops Unsloth Downloads for Local Coding Agents

Qwen

Feb 23, 2026 · Updated Apr 25, 2026

Unsloth's GGUF quantization of Qwen3-Coder-Next hit 502K downloads, becoming the platform's most popular model. The 80B coding model runs locally on a 36GB Mac and works as a backend for Claude Code and Codex, bringing agentic coding to consumer hardware.

Unsloth, a model fine-tuning and quantization tool, reports that their GGUF quantization of Qwen3-Coder-Next hit 502K downloads - the most downloaded model on their platform. The 80B MoE coding model uses just 3B active parameters, letting it run on devices with 36GB RAM or unified memory. Lower-precision quants (3-bit) work on even smaller setups.

The practical value is running a competitive coding agent entirely on local hardware. Qwen3-Coder-Next scored over 70% on SWE-Bench Verified, and the GGUF version works directly as a backend for Claude Code and Codex through llama.cpp, with no API costs or cloud dependency. Unsloth's Dynamic GGUF format preserves model quality at reduced precision better than standard quantization.

Point Claude Code or Codex at a local llama.cpp server endpoint - the Unsloth guide covers setup for different RAM configurations and optimal generation parameters.

View the full update on unsloth.ai

Unsloth AI

@UnslothAIFeb 23

Qwen3-Coder-Next GGUF is now the most downloaded model on Unsloth! The 80B coding LLM runs on a 36GB RAM Mac / device. Use via Claude Code and Codex locally. https://t.co/KXvLJ8gsM1

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Qwen →

Keep reading

Ollama Launches Qwen 3.6 27B with Native Support for Agentic Coding Tools

Ollama added the Qwen 3.6 27B model to its library, enabling local execution of the latest open-weight coding model. The update introduces direct integration with agentic frameworks like OpenClaw and Claude Code, allowing developers to run autonomous coding workflows entirely on local hardware.

Georgi Gerganov Recommends Qwen 3.5 to Solve Local Coding Agent Performance Issues

Simon WillisonMar 31

Georgi Gerganov Recommends Qwen 3.5 to Solve Local Coding Agent Performance Issues

Georgi Gerganov identified Qwen 3.5 as a major advancement for local coding tasks across various hardware sizes. He noted that disappointing performance in local agents often stems from the software harness and prompt construction rather than the model itself. This highlights the need for precise integration to match frontier-level agentic capabilities.

Qwen 3.5 Vision Models Now Runnable Locally via Ollama

QwenFeb 25

Qwen 3.5 Vision Models Now Runnable Locally via Ollama

Qwen 3.5 vision models are now available locally via Ollama, with the 35B fitting on a 24GB GPU. All three models include built-in vision, expanded language support, and improved efficiency compared to previous Qwen releases.

OpenRouter Adds Qwen3.7-Max for Long Horizon Agentic Coding and Office Tasks

OpenRouterMay 21

OpenRouter Adds Qwen3.7-Max for Long Horizon Agentic Coding and Office Tasks

OpenRouter integrated Alibaba's Qwen3.7-Max, a flagship model optimized for autonomous agent loops and multi-hour task execution. The update introduces explicit prompt caching for the Qwen series, allowing developers to maintain massive context windows at a 90 percent discount on subsequent requests.