Arena.ai Ranks Claude Opus 4.7 as the Most Dominant Frontier Model

Arena

May 12, 2026 · Updated Jun 5, 2026

Arena.ai released its latest Text Arena rankings based on over 6 million community votes, placing Anthropic's Claude Opus 4.7 Thinking at the top of the leaderboard. The data reveals that while overall scores are tightening, models are developing specialized strengths in areas like creative writing, math, and expert-level reasoning.

Arena.ai, a community-driven platform for blind AI model evaluation, published its May 2026 rankings across 357 text models. Anthropic's claude-opus-4-7-thinking secured the #1 spot, continuing Claude's sweep of top coding rankings. The update uses 6 million human votes to identify frontier models (the most capable AI systems).

Claude Opus 4.7 Thinking Elo: 1503
Claude Opus 4.7 Thinking Pricing (input): $5 per million tokens
Claude Opus 4.7 Thinking Context: 1M tokens
Gemini 3.1 Pro Pricing (input): $2 per million tokens
GPT-5.5 High Context: 1.1M tokens

Rankings highlight a shift toward specialized model personalities. While claude-opus-4-7-thinking is the only model in the top five across every category, competitors have niches. OpenAI's Arena GPT-5.5 rankings lead in expert tasks, while xAI's grok-4.20 and Google's gemini-3.1-pro excel in creative writing.

Use these rankings to select the best model for your workflow. For agentic coding, Meta's muse-spark and claude-opus-4-7 are top performers. If your tasks require deep reasoning, GPT-5.5's specialized variants are the preferred choice. Full rankings, including pricing and context window (the data a model processes at once) details, are live.

View the full update on arena.ai

Arena.ai

@arenaMay 12

The top 5 labs in Text Arena rankings by category show that frontier models have distinct strengths and tradeoffs. #1 @AnthropicAI, Claude Opus 4.7 - The most consistently dominant model overall, leading top-tier across nearly every major category. #2 @GoogleDeepMind, Gemini 3.1 Pro - Well-rounded, with a notable edge in Creative Writing, ranked below Opus 4.7 and GPT-5.5 High in Expert #3 @AIatMeta, Muse Spark - Particularly strong in Overall and Coding, though it’s lagging behind in Expert tasks, Math, and Longer Query performance. #4 @OpenAI, GPT-5.5 High - One of the most balanced models overall, staying competitive with the top two across most categories, with especially strong performance in Expert and Math. #5 @xAI, Grok 4.20 - A more specialized profile, standing out primarily in Creative Writing and Hard Prompts, while lagging behind in Expert tasks.

75583

View on X

Still wondering? A few quick answers below.

The Text Arena is a community-driven evaluation platform where users participate in blind human preference testing to rank large language models. It measures performance across open-ended domains like math, coding, and creative writing. With over 6 million votes across 357 models, it provides a verified hierarchy of the most capable AI systems based on real-world usage.

Anthropic's Claude Opus 4.7 Thinking currently holds the top position on the leaderboard. It is recognized as the most consistently dominant model overall and is the only system to rank in the top five across every major evaluation category. It features a 1 million token context window and is priced at 5 dollars per million input tokens.

While Claude Opus 4.7 leads the overall rankings, OpenAI's GPT-5.5 High is noted for its balanced performance and specific strengths in expert-level tasks and mathematics. GPT-5.5 High has a slightly larger context window of 1.1 million tokens but carries a higher output price of 30 dollars per million tokens compared to 25 dollars for Claude.

Meta's Muse Spark is currently ranked third overall and shows particular strength in general tasks and coding. However, the model currently lags behind leaders like Claude Opus 4.7 and GPT-5.5 High in specialized areas such as expert-tier reasoning, math, and handling longer queries. It is currently listed with a preliminary ranking status on the leaderboard.

Pricing for frontier models varies by provider. Claude Opus 4.7 costs 5 dollars per million input tokens and 25 dollars for output. Google's Gemini 3.1 Pro is more affordable at 2 dollars for input and 12 dollars for output. OpenAI's GPT-5.5 High is the most expensive at 5 dollars for input and 30 dollars for output.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Arena →

Keep reading

Anthropic Claude Models Sweep Top Five Spots in Arena Coding Leaderboard

Arena.ai's latest Image-to-WebDev leaderboard shows Anthropic's Claude models occupying the entire top five, with Claude Opus 4.7 Thinking taking the #1 position. The shift highlights a rapid turnover in agentic coding performance as older frontier models from OpenAI and Google fall out of the top rankings.

Artificial Analysis crowns Claude Opus 4.8 as the new intelligence leader

Artificial AnalysisMay 31

Artificial Analysis crowns Claude Opus 4.8 as the new intelligence leader

Artificial Analysis has ranked Claude Opus 4.8 as the new leader on its Intelligence Index, surpassing GPT-5.5 (xhigh). The model shows significant gains in agentic workflows and scientific reasoning while maintaining lower hallucination rates than its peers. This shift marks a return to the top for Anthropic in independent frontier model evaluations.

What is the Arena.ai Text Arena leaderboard?

Which AI model is currently ranked number one for text performance?

How does GPT-5.5 High compare to Claude Opus 4.7?

What are the strengths and weaknesses of Meta Muse Spark?

What is the pricing for the top-ranked AI models on Arena.ai?

Keep reading

Anthropic Claude Models Sweep Top Five Spots in Arena Coding Leaderboard

Anthropic Claude Models Sweep Top Five Spots in Arena Coding Leaderboard

Artificial Analysis crowns Claude Opus 4.8 as the new intelligence leader

Artificial Analysis crowns Claude Opus 4.8 as the new intelligence leader

Keep reading

Anthropic Claude Models Sweep Top Five Spots in Arena Coding Leaderboard

Anthropic Claude Models Sweep Top Five Spots in Arena Coding Leaderboard

Artificial Analysis crowns Claude Opus 4.8 as the new intelligence leader

Artificial Analysis crowns Claude Opus 4.8 as the new intelligence leader