Arena.ai Data Shows Open Source Models Have Mostly Closed the Proprietary Gap

Arena

May 7, 2026 · Updated May 15, 2026

Arena.ai analyzed three years of human preference data and found that the performance lead held by proprietary models has shrunk from 250 points to just 30. While open-source models briefly took the lead on expert-level prompts in early 2025, proprietary systems have since regained a narrow but consistent edge.

Arena.ai tracked three years of human preference data and found that open-source models have mostly closed the performance gap with proprietary systems. The +250 point lead once held by closed-source models in the Text Arena has collapsed to +30 points, a margin that separates rank #1 from rank #18.

While open-source models like DeepSeek V4 Pro have reached parity on general tasks, "Expert" prompts remain the final frontier. Proprietary models maintain a +40 point lead here, representing the distance between rank #1 and rank #8. This gap briefly flipped in early 2025, but proprietary labs have since regained a lead.

For general applications, the performance difference is now marginal, suggesting high-level reasoning is becoming commoditized. However, proprietary models still offer superior consistency for complex tasks, a trend seen in recent GPT-5.5 leaderboard rankings. You can view the full historical data and filter by use cases online.

View the full update on arena.ai

Arena.ai

@arenaMay 7

Have open source models closed the gap with proprietary ones? We've tracked three years of Arena data across three arenas. The short answer: mostly yes. In Text Arena, the proprietary winner had a +250 Arena lead. By early 2025, it had fallen to low double digits, and at its narrowest was almost closed entirely. Today, the proprietary lead is about +30 points. It separates #1 from roughly #18 on the current leaderboard. - Open source has quickly closed most of the gap - The biggest gains happened before 2025 - The remaining gap is small in points, but still large in rank Get a deeper look into the race for Code Arena: Frontend and Expert prompts in the thread 🧵

886

View on X

Still wondering? A few quick answers below.

According to three years of Arena data, the gap has narrowed significantly. In the Text Arena, the lead held by proprietary models dropped from 250 points to roughly 30 points today. While this point difference is small, it still represents a gap of about 18 positions on the current leaderboard between the top model and the best open-source alternative.

Expert prompts remain the most difficult challenge for open-source models. Proprietary systems currently maintain a 40-point lead in this category, which is the distance between the first and eighth ranks. While open-source models have shown they can reach the top of the leaderboard for hard prompts, proprietary models have been more consistent at maintaining the number one spot.

Yes, open-source models have briefly taken the lead in specific categories. In early 2025, the DeepSeek R1 model moved ahead of proprietary competitors on expert-level prompts, turning a narrow gap into a short-lived open-source lead. However, proprietary models quickly regained the top position and have generally remained more consistent in holding the lead on the toughest challenges.

Arena data shows that the majority of the progress made by open-source models happened before 2025. During that period, the massive 250-point lead held by proprietary winners in the Text Arena fell to low double digits. By early 2025, the gap had narrowed to its closest point, nearly closing entirely before proprietary models established their current 30-point lead.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Arena →

Keep reading

Arena.ai Data Shows US Lead Over Chinese AI Models Has Effectively Collapsed

Arena.ai's latest Text Arena data reveals that the performance gap between top US and Chinese AI models has shrunk from 278 to just 29 Elo points in three years. This real-world evidence confirms that Chinese labs have reached near-parity with frontier US systems despite hardware restrictions.

What is the performance gap between open source and proprietary AI models?

How do open source models perform on expert-level prompts?

Did open source models ever beat proprietary models on the Arena leaderboard?

When did the most significant gains for open source models occur?

Keep reading

Arena.ai Data Shows US Lead Over Chinese AI Models Has Effectively Collapsed

Arena.ai Data Shows US Lead Over Chinese AI Models Has Effectively Collapsed

Keep reading

Arena.ai Data Shows US Lead Over Chinese AI Models Has Effectively Collapsed

Arena.ai Data Shows US Lead Over Chinese AI Models Has Effectively Collapsed