Gemini 3.1 Pro Tops ARC-AGI-2 Benchmark with 77.1% Score

Google DeepMindGoogle DeepMind

· Updated

Google released Gemini 3.1 Pro, scoring 77.1% on ARC-AGI-2 - more than double Gemini 3 Pro's score and ahead of Claude Opus 4.6. It's rolling out to the Gemini app, NotebookLM, and developer APIs including Gemini CLI and Antigravity.

Gemini 3.1 Pro is Google's upgraded reasoning model in the Gemini 3 series. On ARC-AGI-2 - a benchmark testing ability to solve entirely novel logic patterns - it scored 77.1% in Thinking (High) mode, more than double the 31.1% from Gemini 3 Pro and ahead of Claude Opus 4.6 (68.8%). The upgrade targets core reasoning rather than expanded multimodal features.

The practical difference shows in tasks where simple answers fall short: building a live aerospace dashboard from a public telemetry stream, generating interactive animated SVGs in pure code, or reasoning through literary tone to design a functional portfolio site. These are existing capabilities made more capable by stronger underlying reasoning.

3.1 Pro is in preview via the Gemini API in Google AI Studio, Gemini CLI, and Antigravity. Consumer access is rolling out through the Gemini app and NotebookLM on Pro and Ultra plans.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update