I tried running the same "Generate an SVG of a pelican riding a bicycle" prompt against 21 different quantized variants of the same IBM Granite 4.1 3B model - the results weren't as interesting as I had hoped https://t.co/rBvko3ZISM
Simon Willison Benchmarks IBM Granite 4.1 Quantizations for Spatial Reasoning
Simon Willison benchmarked 21 quantized
GGUF variants (compressed model formats) of the IBM Granite 4.1 3B model released by Unsloth, a model optimization team. Using an SVG generation prompt, he tested files from 1.2GB to 6.34GB to see if higher-precision quantizations improved spatial reasoning.The results challenge the assumption that larger quantizations of small models are inherently more capable. While Willison previously found that local Qwen3.6 models could outperform frontier models at SVG tasks, these tests produced poor outputs across the board. This suggests that architecture matters more than compression levels.
For developers building local apps, these findings suggest that downloading the largest quantization of a 3B model may not provide a quality boost for complex reasoning. The IBM Granite 4.1 family is available under the Apache 2.0 license. You can access the quantized variants via Unsloth on Hugging Face.
Simon Willison
@simonw
1retweets25likes
View on XStill wondering? A few quick answers below.
IBM Granite 4.1 is a collection of large language models released under the Apache 2.0 license. The family includes models in three sizes: 3B, 8B, and 30B parameters. These models are designed for instruction following and tool calling, and they are available as open weights for developers to run on their own hardware.
In a benchmark testing 21 different quantized variants, there was no clear correlation between model size and output quality for complex spatial tasks like generating SVG code. Smaller quantized files, which use less memory, occasionally produced better results than larger versions of the same model, suggesting that architecture is more important than precision for these tasks.
GGUF is a file format used to store quantized versions of large language models, allowing them to run on consumer hardware with limited RAM. Quantization compresses the model by reducing the precision of its weights. For the Granite 4.1 3B model, these variants range from 1.2GB to over 6GB, representing different levels of compression and performance.
Yes, the IBM Granite 4.1 family of models is released under the Apache 2.0 license. This is a permissive open-source license that allows individuals and organizations to use, modify, and distribute the models for both research and commercial purposes. The model weights are publicly available for download and local deployment.
Quantized versions of the IBM Granite 4.1 3B model are available through the Unsloth collection on Hugging Face. These files are provided in the GGUF format, which is compatible with various local inference tools. The collection includes 21 different variants, allowing users to choose a model size that fits their specific hardware constraints and memory limits.




