The top 5 labs in Text Arena rankings by category show that frontier models have distinct strengths and tradeoffs. #1 @AnthropicAI, Claude Opus 4.7 - The most consistently dominant model overall, leading top-tier across nearly every major category. #2 @GoogleDeepMind, Gemini 3.1 Pro - Well-rounded, with a notable edge in Creative Writing, ranked below Opus 4.7 and GPT-5.5 High in Expert #3 @AIatMeta, Muse Spark - Particularly strong in Overall and Coding, though it’s lagging behind in Expert tasks, Math, and Longer Query performance. #4 @OpenAI, GPT-5.5 High - One of the most balanced models overall, staying competitive with the top two across most categories, with especially strong performance in Expert and Math. #5 @xAI, Grok 4.20 - A more specialized profile, standing out primarily in Creative Writing and Hard Prompts, while lagging behind in Expert tasks.
Arena.ai Ranks Claude Opus 4.7 as the Most Dominant Frontier Model
Arena.ai, a community-driven platform for blind AI model evaluation, published its May 2026 rankings across 357 text models. Anthropic's
claude-opus-4-7-thinking secured the #1 spot, continuing Claude's sweep of top coding rankings. The update uses 6 million human votes to identify frontier models (the most capable AI systems).- Claude Opus 4.7 Thinking Elo
- 1503
- Claude Opus 4.7 Thinking Pricing (input)
- $5 per million tokens
- Claude Opus 4.7 Thinking Context
- 1M tokens
- Gemini 3.1 Pro Pricing (input)
- $2 per million tokens
- GPT-5.5 High Context
- 1.1M tokens
Rankings highlight a shift toward specialized model personalities. While claude-opus-4-7-thinking is the only model in the top five across every category, competitors have niches. OpenAI's Arena GPT-5.5 rankings lead in expert tasks, while xAI's grok-4.20 and Google's gemini-3.1-pro excel in creative writing.
Use these rankings to select the best model for your workflow. For agentic coding, Meta's muse-spark and claude-opus-4-7 are top performers. If your tasks require deep reasoning, GPT-5.5's specialized variants are the preferred choice. Full rankings, including pricing and context window (the data a model processes at once) details, are live.
Arena.ai
@arena
75retweets583likes
View on XStill wondering? A few quick answers below.
The Text Arena is a community-driven evaluation platform where users participate in blind human preference testing to rank large language models. It measures performance across open-ended domains like math, coding, and creative writing. With over 6 million votes across 357 models, it provides a verified hierarchy of the most capable AI systems based on real-world usage.
Anthropic's Claude Opus 4.7 Thinking currently holds the top position on the leaderboard. It is recognized as the most consistently dominant model overall and is the only system to rank in the top five across every major evaluation category. It features a 1 million token context window and is priced at 5 dollars per million input tokens.
While Claude Opus 4.7 leads the overall rankings, OpenAI's GPT-5.5 High is noted for its balanced performance and specific strengths in expert-level tasks and mathematics. GPT-5.5 High has a slightly larger context window of 1.1 million tokens but carries a higher output price of 30 dollars per million tokens compared to 25 dollars for Claude.
Meta's Muse Spark is currently ranked third overall and shows particular strength in general tasks and coding. However, the model currently lags behind leaders like Claude Opus 4.7 and GPT-5.5 High in specialized areas such as expert-tier reasoning, math, and handling longer queries. It is currently listed with a preliminary ranking status on the leaderboard.
Pricing for frontier models varies by provider. Claude Opus 4.7 costs 5 dollars per million input tokens and 25 dollars for output. Google's Gemini 3.1 Pro is more affordable at 2 dollars for input and 12 dollars for output. OpenAI's GPT-5.5 High is the most expensive at 5 dollars for input and 30 dollars for output.




