Georgi on why it's still hard to get great coding agent performance from local models: "Note that the main issues that people currently unknowingly face with local models mostly revolve around the harness and some intricacies around model chat templates and prompt construction"
Georgi Gerganov Recommends Qwen 3.5 to Solve Local Coding Agent Performance Issues
· Updated
Georgi Gerganov, the creator of llama.cpp, identified the Qwen 3.5 model series as a significant advancement for local AI development. While these models are highly capable, Gerganov noted that poor performance often stems from the harness—the specific implementation of chat templates and prompt construction required for agentic workflows.
This insight addresses a growing frustration among developers attempting to run autonomous agents like Claude Code or Codex using local models. Even with powerful hardware, subtle differences in how a model expects instructions can lead to failures that appear to be a lack of reasoning but are actually integration errors.
If you are building local coding agents, prioritize testing with the Qwen 3.5 family across your hardware. Success requires moving beyond generic prompts and ensuring your agentic framework is specifically tuned to the model's native chat template. This alignment is critical for achieving the reliability seen in proprietary frontier models.
Simon Willison
@simonw
5retweets159likes
View on X



