Georgi Gerganov Recommends Qwen 3.5 to Solve Local Coding Agent Performance Issues

Simon WillisonSimon Willison

· Updated

Georgi Gerganov identified Qwen 3.5 as a major advancement for local coding tasks across various hardware sizes. He noted that disappointing performance in local agents often stems from the software harness and prompt construction rather than the model itself. This highlights the need for precise integration to match frontier-level agentic capabilities.

Georgi Gerganov, the creator of llama.cpp, identified the Qwen 3.5 model series as a significant advancement for local AI development. While these models are highly capable, Gerganov noted that poor performance often stems from the harness—the specific implementation of chat templates and prompt construction required for agentic workflows.

This insight addresses a growing frustration among developers attempting to run autonomous agents like Claude Code or Codex using local models. Even with powerful hardware, subtle differences in how a model expects instructions can lead to failures that appear to be a lack of reasoning but are actually integration errors.

If you are building local coding agents, prioritize testing with the Qwen 3.5 family across your hardware. Success requires moving beyond generic prompts and ensuring your agentic framework is specifically tuned to the model's native chat template. This alignment is critical for achieving the reliability seen in proprietary frontier models.

Simon Willison
Simon Willison
@simonw
X

Georgi on why it's still hard to get great coding agent performance from local models: "Note that the main issues that people currently unknowingly face with local models mostly revolve around the harness and some intricacies around model chat templates and prompt construction"

5retweets159likes
View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update