Anthropic Claude Models Sweep Top Five Spots in Arena Coding Leaderboard

Arena

May 7, 2026 · Updated Jun 5, 2026

Arena.ai's latest Image-to-WebDev leaderboard shows Anthropic's Claude models occupying the entire top five, with Claude Opus 4.7 Thinking taking the #1 position. The shift highlights a rapid turnover in agentic coding performance as older frontier models from OpenAI and Google fall out of the top rankings.

Arena.ai, a community-driven evaluation platform, reported a major turnover in its Image-to-WebDev leaderboard for agentic coding (AI that autonomously writes and iterates on code). Anthropic now holds the top five positions, led by Claude Opus 4.7 Thinking at #1. This model outperformed the previous leader by 30 points.

#1 Rank: Claude Opus 4.7 Thinking
Top 5 Sweep: Anthropic Claude models
OpenAI Entry: GPT-5.5 (#6 and #8)
Google Entry: Gemini-3.1 Pro (#7)
Alibaba Entry: Qwen-3.6 Plus (#9)
Score Gap: 30 points (Opus 4.7 Thinking over Sonnet 4.6)

This shift highlights the impact of multimodal reasoning. While Arena.ai's GPT-5.5 ranking placed it in the sixth and eighth spots, older versions like GPT-5.4 have vanished from the top 10. The dominance of "Thinking" models suggests that inference-time compute (allocating more processing power during generation) is now the primary differentiator.

For developers, these results validate Claude as the benchmark for visual-to-code workflows. Google remains in the top 10, while the Qwen3.6 Plus Arena climb secured the #9 spot. These rankings provide a verified guide for selecting models that can handle complex, multi-step frontend implementation.

View the full update on arena.ai

Arena.ai

@arenaMay 7

Code Arena's frontend leaderboard for models using visual inputs in agentic coding has turned over fast. Half the top 10 is new this month, with Claude setting the pace and older OpenAI and Gemini entries no longer in the top 10. - Claude by @AnthropicAI now takes all the top five. Opus 4.7 Thinking enters at #1, about 30 points ahead of Sonnet 4.6, while Opus 4.7 also lands at #3. - Claude 4.6 models mostly improved in score, but lost rank due to new 4.7 models moving the ceiling higher. - Older GPT-5.4 and GPT-5.3 Codex entries from @OpenAI are no longer in the top, while GPT-5.5 enters at #6 and #8. - Gemini by @GoogleDeepMind remains in the top 10 but has been pushed down: Gemini-3.1 Pro falls to #7, Gemini-3 Pro to #10, and Gemini-3 Flash drops out. - Qwen-3.6 Plus by @Alibaba_Qwen enters at #9, adding another new provider to the updated top 10.

270

View on X

Still wondering? A few quick answers below.

The Image-to-WebDev leaderboard is a specialized evaluation within Arena.ai that ranks AI models on their ability to handle agentic coding tasks using visual inputs. It specifically measures how well models can take an image, such as a UI mockup, and autonomously generate the corresponding frontend web development code to implement that design.

Claude Opus 4.7 Thinking from Anthropic is currently the top-ranked model on the leaderboard. It debuted at the #1 spot, scoring approximately 30 points higher than the previous leader, Claude Sonnet 4.6. This model is part of a broader sweep where Anthropic models now occupy all of the top five positions on the leaderboard.

OpenAI and Google models have lost ground in the latest rankings. Older GPT-5.4 and GPT-5.3 Codex entries fell out of the top 10 entirely, though the newer GPT-5.5 entered at #6 and #8. Google's Gemini-3.1 Pro dropped to #7, Gemini-3 Pro fell to #10, and Gemini-3 Flash dropped out of the top 10.

Agentic coding refers to AI systems that can autonomously plan, reason, and execute multi-step coding tasks rather than just providing single-turn code completions. On this leaderboard, it specifically involves models using visual reasoning to navigate web development tasks, such as implementing a frontend interface based on a provided image or design mockup.

Half of the top 10 models are new this month. Key entries include Claude Opus 4.7 Thinking at #1 and another Opus 4.7 variant at #3. OpenAI's GPT-5.5 entered at #6 and #8, while Alibaba's Qwen-3.6 Plus secured the #9 spot, marking the addition of a new provider to the top tier of the leaderboard.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Arena →

Keep reading

Arena.ai Ranks Claude Opus 4.7 as the Most Dominant Frontier Model

Arena.ai released its latest Text Arena rankings based on over 6 million community votes, placing Anthropic's Claude Opus 4.7 Thinking at the top of the leaderboard. The data reveals that while overall scores are tightening, models are developing specialized strengths in areas like creative writing, math, and expert-level reasoning.

Claude Opus 4.8 takes top spot on agentic work benchmark

Artificial AnalysisJun 1

Claude Opus 4.8 takes top spot on agentic work benchmark

Anthropic's Claude Opus 4.8 has claimed the lead on the GDPval-AA leaderboard for agentic professional tasks. The model achieved an 1890 Elo rating, demonstrating a 67% win rate against GPT-5.5 xhigh in real-world work scenarios. This update establishes a new performance ceiling for AI agents capable of producing complex office deliverables.

What is the Arena.ai Image-to-WebDev leaderboard?

Which AI model is currently ranked #1 for coding on Arena.ai?

How did OpenAI and Google models perform in the latest Code Arena update?

What is agentic coding in the context of the Arena.ai leaderboard?

Which new AI models entered the top 10 of the Code Arena this month?

Keep reading

Arena.ai Ranks Claude Opus 4.7 as the Most Dominant Frontier Model

Arena.ai Ranks Claude Opus 4.7 as the Most Dominant Frontier Model

Claude Opus 4.8 takes top spot on agentic work benchmark

Claude Opus 4.8 takes top spot on agentic work benchmark

Keep reading

Arena.ai Ranks Claude Opus 4.7 as the Most Dominant Frontier Model

Arena.ai Ranks Claude Opus 4.7 as the Most Dominant Frontier Model

Claude Opus 4.8 takes top spot on agentic work benchmark

Claude Opus 4.8 takes top spot on agentic work benchmark