HeadsUpAI

Qwen3-Coder-Next GGUF Tops Unsloth Downloads for Local Coding Agents

· Updated

Unsloth, a model fine-tuning and quantization tool, reports that their GGUF quantization of Qwen3-Coder-Next hit 502K downloads - the most downloaded model on their platform. The 80B MoE coding model uses just 3B active parameters, letting it run on devices with 36GB RAM or unified memory. Lower-precision quants (3-bit) work on even smaller setups.

The practical value is running a competitive coding agent entirely on local hardware. Qwen3-Coder-Next scored over 70% on SWE-Bench Verified, and the GGUF version works directly as a backend for Claude Code and Codex through llama.cpp, with no API costs or cloud dependency. Unsloth's Dynamic GGUF format preserves model quality at reduced precision better than standard quantization.

Point Claude Code or Codex at a local llama.cpp server endpoint - the Unsloth guide covers setup for different RAM configurations and optimal generation parameters.

Unsloth AI
Unsloth AI
@UnslothAI
X

Qwen3-Coder-Next GGUF is now the most downloaded model on Unsloth! The 80B coding LLM runs on a 36GB RAM Mac / device. Use via Claude Code and Codex locally. https://t.co/KXvLJ8gsM1

90retweets
View on X

Share this update