Gemma-4 lands in Vision Arena as #2 & #4 open models, and shifts the Pareto frontier! @GoogleDeepMind dominates the price-performance Pareto in Vision across both proprietary and open models. - Gemma-4-31b ranks #2 open (#20 overall) - Gemma-4-26b-a4b ranks #4 open (#26 overall) The Vision Arena ranks multimodal AI models capable of reasoning over visual inputs. Congrats to @GoogleDeepMind again on the open model progress!
Arena Ranks Google Gemma 4 as Top Open Vision Model
· Updated
Arena, a community-driven platform for evaluating AI models through human preference voting, added Google's Gemma-4 family to its Vision Arena leaderboard. The
gemma-4-31b model debuted as the #2 open-weight model. These multimodal models (AI that processes text and images simultaneously) are designed for advanced reasoning and agentic workflows.- Gemma-4-31b rank
- #2 open, #20 overall
- Gemma-4-26b-a4b rank
- #4 open, #26 overall
- License
- Apache 2.0
- Pricing (31b input)
- $0.14 per million tokens
- Pricing (31b output)
- $0.40 per million tokens
- Context window
- 262.1K tokens
The rankings confirm that open-weight models are closing the proprietary performance gap in visual reasoning. By outperforming several versions of GPT-4o, Gemma-4 shifts the Pareto frontier—the optimal balance between cost and capability. This follows Gemma 4's manual visual token controls and Arena's agentic coding leaderboard sweep.
You can now deploy these models on private hardware under an Apache 2.0 license, making high-tier vision reasoning viable for local agentic loops. The 31b model is priced at $0.14 per million input tokens. This enables native multimodal capabilities in applications without the latency or privacy constraints of closed-source APIs.
Arena.ai
@arena
9retweets126likes
View on XStill wondering? A few quick answers below.
The Vision Arena is a community-driven evaluation platform that ranks multimodal AI models based on their ability to reason over visual inputs. It uses a blind human preference voting system where users compare model outputs to determine Elo ratings. As of May 2026, the leaderboard includes over 120 proprietary and open-weight models.
Google's Gemma-4-31b debuted as the number two open-weight model and ranked twentieth overall. The smaller Gemma-4-26b-a4b version entered as the fourth-ranked open model and twenty-sixth overall. These rankings place Gemma 4 ahead of several proprietary frontier models, including specific versions of GPT-4o and Gemini 2.0 Flash.
Gemma 4 is a family of open-weight models released under the Apache 2.0 license. This licensing allows developers and researchers to download, run, and customize the models on their own hardware. Unlike proprietary models that require cloud API access, Gemma 4 provides the flexibility of local deployment for advanced reasoning and agentic workflows.
While Gemma 4 can be run locally for free, it is also available via API. The Gemma-4-31b model is priced at 0.14 dollars per million input tokens and 0.40 dollars per million output tokens. This pricing structure is significantly lower than many proprietary frontier models that offer comparable performance on visual reasoning tasks.
The Gemma 4 models feature a context window of 262,144 tokens, allowing them to process large amounts of information in a single request. They are natively multimodal, meaning they can reason across text and visual data simultaneously. The family includes a 31-billion parameter version and a 26-billion parameter version optimized for efficiency.




