Have open source models closed the gap with proprietary ones? We've tracked three years of Arena data across three arenas. The short answer: mostly yes. In Text Arena, the proprietary winner had a +250 Arena lead. By early 2025, it had fallen to low double digits, and at its narrowest was almost closed entirely. Today, the proprietary lead is about +30 points. It separates #1 from roughly #18 on the current leaderboard. - Open source has quickly closed most of the gap - The biggest gains happened before 2025 - The remaining gap is small in points, but still large in rank Get a deeper look into the race for Code Arena: Frontend and Expert prompts in the thread 🧵
Arena.ai Data Shows Open Source Models Have Mostly Closed the Proprietary Gap
Arena· Updated
Arena.ai analyzed three years of human preference data and found that the performance lead held by proprietary models has shrunk from 250 points to just 30. While open-source models briefly took the lead on expert-level prompts in early 2025, proprietary systems have since regained a narrow but consistent edge.
While open-source models like DeepSeek V4 Pro have reached parity on general tasks, "Expert" prompts remain the final frontier. Proprietary models maintain a +40 point lead here, representing the distance between rank #1 and rank #8. This gap briefly flipped in early 2025, but proprietary labs have since regained a lead.
For general applications, the performance difference is now marginal, suggesting high-level reasoning is becoming commoditized. However, proprietary models still offer superior consistency for complex tasks, a trend seen in recent GPT-5.5 leaderboard rankings. You can view the full historical data and filter by use cases online.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →