Gemma-4 lands in Vision Arena as #2 & #4 open models, and shifts the Pareto frontier! @GoogleDeepMind dominates the price-performance Pareto in Vision across both proprietary and open models. - Gemma-4-31b ranks #2 open (#20 overall) - Gemma-4-26b-a4b ranks #4 open (#26 overall) The Vision Arena ranks multimodal AI models capable of reasoning over visual inputs. Congrats to @GoogleDeepMind again on the open model progress!
Arena Ranks Google Gemma 4 as Top Open Vision Model
Arena· Updated
Google's Gemma-4-31b and Gemma-4-26b-a4b have entered the Vision Arena leaderboard as the #2 and #4 ranked open models. These releases shift the price-performance frontier by delivering vision reasoning capabilities that rival proprietary systems at a fraction of the cost.
gemma-4-31b model debuted as the #2 open-weight model. These multimodal models (AI that processes text and images simultaneously) are designed for advanced reasoning and agentic workflows.- Gemma-4-31b rank
- #2 open, #20 overall
- Gemma-4-26b-a4b rank
- #4 open, #26 overall
- License
- Apache 2.0
- Pricing (31b input)
- $0.14 per million tokens
- Pricing (31b output)
- $0.40 per million tokens
- Context window
- 262.1K tokens
The rankings confirm that open-weight models are closing the proprietary performance gap in visual reasoning. By outperforming several versions of GPT-4o, Gemma-4 shifts the Pareto frontier—the optimal balance between cost and capability. This follows Gemma 4's manual visual token controls and Arena's agentic coding leaderboard sweep.
You can now deploy these models on private hardware under an Apache 2.0 license, making high-tier vision reasoning viable for local agentic loops. The 31b model is priced at $0.14 per million input tokens. This enables native multimodal capabilities in applications without the latency or privacy constraints of closed-source APIs.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →




