Georgi Gerganov Recommends Qwen 3.5 to Solve Local Coding Agent Performance Issues

Simon Willison

Mar 31, 2026 · Updated Jun 5, 2026

Georgi Gerganov identified Qwen 3.5 as a major advancement for local coding tasks across various hardware sizes. He noted that disappointing performance in local agents often stems from the software harness and prompt construction rather than the model itself. This highlights the need for precise integration to match frontier-level agentic capabilities.

Georgi Gerganov, the creator of llama.cpp, identified the Qwen 3.5 model series as a significant advancement for local AI development. While these models are highly capable, Gerganov noted that poor performance often stems from the harness—the specific implementation of chat templates and prompt construction required for agentic workflows.

This insight addresses a growing frustration among developers attempting to run autonomous agents like Claude Code or Codex using local models. Even with powerful hardware, subtle differences in how a model expects instructions can lead to failures that appear to be a lack of reasoning but are actually integration errors.

If you are building local coding agents, prioritize testing with the Qwen 3.5 family across your hardware. Success requires moving beyond generic prompts and ensuring your agentic framework is specifically tuned to the model's native chat template. This alignment is critical for achieving the reliability seen in proprietary frontier models.

Simon Willison

@simonwMar 30

Georgi on why it's still hard to get great coding agent performance from local models: "Note that the main issues that people currently unknowingly face with local models mostly revolve around the harness and some intricacies around model chat templates and prompt construction"

5159

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Keep reading

Qwen3-Coder-Next GGUF Tops Unsloth Downloads for Local Coding Agents

Unsloth's GGUF quantization of Qwen3-Coder-Next hit 502K downloads, becoming the platform's most popular model. The 80B coding model runs locally on a 36GB Mac and works as a backend for Claude Code and Codex, bringing agentic coding to consumer hardware.

Ollama Launches Qwen 3.6 27B with Native Support for Agentic Coding Tools

OllamaApr 24

Ollama Launches Qwen 3.6 27B with Native Support for Agentic Coding Tools

Ollama added the Qwen 3.6 27B model to its library, enabling local execution of the latest open-weight coding model. The update introduces direct integration with agentic frameworks like OpenClaw and Claude Code, allowing developers to run autonomous coding workflows entirely on local hardware.

Fireworks AI Adds Qwen 3.5 Training to Build Custom Reasoning Agents

Fireworks AIApr 30

Fireworks AI Adds Qwen 3.5 Training to Build Custom Reasoning Agents

Fireworks AI integrated Alibaba's Qwen 3.5 into its training platform, supporting full-parameter fine-tuning and reinforcement learning with a 256K context window. This allows developers to customize the high-performance open-weight model for specialized reasoning and coding tasks on a unified stack.

Cursor Details the Agent Harness Engineering That Drives Coding Performance

CursorApr 30

Cursor Details the Agent Harness Engineering That Drives Coding Performance

Cursor shared a technical deep dive into its agent harness, the orchestration layer that manages context, tools, and error correction for its AI coding agents. The update reveals that agent performance depends less on raw model power and more on specialized engineering like dynamic context discovery and model-specific tool tuning. This shift highlights why the harness is becoming the primary competitive moat for AI-native development tools.