Simon Willison Benchmarks IBM Granite 4.1 Quantizations for Spatial Reasoning

Simon Willison

May 5, 2026

Simon Willison tested 21 quantized variants of IBM's new Granite 4.1 3B model to see if model size correlates with SVG generation quality. His findings show no distinguishable pattern between quantization level and performance, with even the smallest variants occasionally outperforming larger ones.

Simon Willison benchmarked 21 quantized GGUF variants (compressed model formats) of the IBM Granite 4.1 3B model released by Unsloth, a model optimization team. Using an SVG generation prompt, he tested files from 1.2GB to 6.34GB to see if higher-precision quantizations improved spatial reasoning.

The results challenge the assumption that larger quantizations of small models are inherently more capable. While Willison previously found that local Qwen3.6 models could outperform frontier models at SVG tasks, these tests produced poor outputs across the board. This suggests that architecture matters more than compression levels.

For developers building local apps, these findings suggest that downloading the largest quantization of a 3B model may not provide a quality boost for complex reasoning. The IBM Granite 4.1 family is available under the Apache 2.0 license. You can access the quantized variants via Unsloth on Hugging Face.

View the full update on simonwillison.net

Simon Willison

@simonwMay 4

I tried running the same "Generate an SVG of a pelican riding a bicycle" prompt against 21 different quantized variants of the same IBM Granite 4.1 3B model - the results weren't as interesting as I had hoped https://t.co/rBvko3ZISM

125

View on X

Still wondering? A few quick answers below.

IBM Granite 4.1 is a collection of large language models released under the Apache 2.0 license. The family includes models in three sizes: 3B, 8B, and 30B parameters. These models are designed for instruction following and tool calling, and they are available as open weights for developers to run on their own hardware.

In a benchmark testing 21 different quantized variants, there was no clear correlation between model size and output quality for complex spatial tasks like generating SVG code. Smaller quantized files, which use less memory, occasionally produced better results than larger versions of the same model, suggesting that architecture is more important than precision for these tasks.

GGUF is a file format used to store quantized versions of large language models, allowing them to run on consumer hardware with limited RAM. Quantization compresses the model by reducing the precision of its weights. For the Granite 4.1 3B model, these variants range from 1.2GB to over 6GB, representing different levels of compression and performance.

Yes, the IBM Granite 4.1 family of models is released under the Apache 2.0 license. This is a permissive open-source license that allows individuals and organizations to use, modify, and distribute the models for both research and commercial purposes. The model weights are publicly available for download and local deployment.

Quantized versions of the IBM Granite 4.1 3B model are available through the Unsloth collection on Hugging Face. These files are provided in the GGUF format, which is compatible with various local inference tools. The collection includes 21 different variants, allowing users to choose a model size that fits their specific hardware constraints and memory limits.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Keep reading

Simon Willison Benchmarks ChatGPT Images 2.0 Spatial Reasoning and Detail Gains

Simon Willison tested OpenAI's new ChatGPT Images 2.0 using a custom Where's Waldo style benchmark to evaluate spatial reasoning and detail retention. His findings suggest the model significantly outperforms previous versions in rendering dense, complex scenes with precise object placement.

Google Research Benchmarks Gemini's 3D Object Generation Through Code

GoogleJun 7

Google Research Benchmarks Gemini's 3D Object Generation Through Code

Google Research introduced 3DCodeBench, a new benchmark evaluating AI models' ability to generate 3D objects using code. This benchmark, presented at CVPR2026, demonstrates how agentic AI can autonomously create complex 3D assets, highlighting the role of iterative refinement in improving model performance.

Arena Ranks Google Gemma 4 as Top Open Vision Model

ArenaMay 8

Arena Ranks Google Gemma 4 as Top Open Vision Model

Google's Gemma-4-31b and Gemma-4-26b-a4b have entered the Vision Arena leaderboard as the #2 and #4 ranked open models. These releases shift the price-performance frontier by delivering vision reasoning capabilities that rival proprietary systems at a fraction of the cost.

LMSYS Org Adds Day Zero SGLang Support for Qwen3.6-27B Reasoning Model

LMSYS OrgApr 24

LMSYS Org Adds Day Zero SGLang Support for Qwen3.6-27B Reasoning Model

LMSYS Org integrated immediate support for Qwen3.6-27B into its SGLang inference framework, enabling high-speed serving of the new 27-billion parameter model. The model outperforms the massive Qwen3.5-397B-A17B on coding benchmarks and introduces native thinking modes for complex reasoning.

What is the IBM Granite 4.1 model family?

How does quantization affect the performance of IBM Granite 4.1 3B?

What are GGUF quantized variants of LLMs?

Is IBM Granite 4.1 open source?

Where can I download quantized versions of IBM Granite 4.1?

Keep reading

Simon Willison Benchmarks ChatGPT Images 2.0 Spatial Reasoning and Detail Gains

Simon Willison Benchmarks ChatGPT Images 2.0 Spatial Reasoning and Detail Gains

Google Research Benchmarks Gemini's 3D Object Generation Through Code

Google Research Benchmarks Gemini's 3D Object Generation Through Code

Arena Ranks Google Gemma 4 as Top Open Vision Model

Arena Ranks Google Gemma 4 as Top Open Vision Model

LMSYS Org Adds Day Zero SGLang Support for Qwen3.6-27B Reasoning Model

LMSYS Org Adds Day Zero SGLang Support for Qwen3.6-27B Reasoning Model

Keep reading

Simon Willison Benchmarks ChatGPT Images 2.0 Spatial Reasoning and Detail Gains

Simon Willison Benchmarks ChatGPT Images 2.0 Spatial Reasoning and Detail Gains

Google Research Benchmarks Gemini's 3D Object Generation Through Code

Google Research Benchmarks Gemini's 3D Object Generation Through Code

Arena Ranks Google Gemma 4 as Top Open Vision Model

Arena Ranks Google Gemma 4 as Top Open Vision Model

LMSYS Org Adds Day Zero SGLang Support for Qwen3.6-27B Reasoning Model

LMSYS Org Adds Day Zero SGLang Support for Qwen3.6-27B Reasoning Model