Google Research Benchmarks Gemini's 3D Object Generation Through Code

Google

Jun 7, 2026

Google Research introduced 3DCodeBench, a new benchmark evaluating AI models' ability to generate 3D objects using code. This benchmark, presented at CVPR2026, demonstrates how agentic AI can autonomously create complex 3D assets, highlighting the role of iterative refinement in improving model performance.

Google Research unveiled 3DCodeBench, a benchmark for evaluating AI models in agentic procedural 3D modeling via code. It assesses how well models, including Gemini variants, generate diverse 3D objects by writing and executing Blender Python scripts. The benchmark includes 212 object categories and evaluates 12 frontier VLMs (vision-language models processing text and images).

Models Evaluated: 12 frontier VLMs
Object Categories: 212
3D Objects with Code: 13,000
Multi-view Renders: 52,000
Ranking System: Human-preference Elo
Top Performing Model (Elo): GPT-5.5 (1167)

3DCodeBench reveals that while models can generate code, they often lack physical-world understanding, producing disconnected or misaligned structures. However, multi-turn refinement with deterministic feedback significantly improves performance, showing that the agentic harness is as crucial as the model itself. This emphasizes the importance of iterative processes in complex AI tasks.

The benchmark's 3DCodeArena provides human-preference Elo rankings for evaluated models, allowing comparison of performance and cost across paid frontier models. 3DCodeBench also offers open data for training and evaluation, including Blender code, multi-view renders, and LLM-generated captions, supporting further research and development in 3D content creation.

View the full update on 3dcodebench.com

Google Research

@GoogleResearchJun 6

See Project Astra 3D at the #CVPR2026 Google booth #557 at 5:30pm! The team presents 3DCodeBench, demonstrating Gemini models' proficiency in generating diverse 3D objects through code execution. Presented by Lei Shu & Yipeng Gao. https://t.co/XlZKytbZwE @GoogleDeepMind https://t.co/OmZ0bz4zVG

1199

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Google →

Keep reading

Google Launches Gemini 3.5 Flash with Frontier Performance for Agentic Coding

Google moved Gemini 3.5 Flash to general availability, positioning it as the strongest agentic and coding model in the Gemini family. The release delivers frontier-level performance at 4x the speed of comparable competitor models, though pricing has risen 3x versus the previous Gemini 3 Flash.

Gemini 3.1 Pro Tops ARC-AGI-2 Benchmark with 77.1% Score

Google DeepMindFeb 19

Gemini 3.1 Pro Tops ARC-AGI-2 Benchmark with 77.1% Score

Google released Gemini 3.1 Pro, scoring 77.1% on ARC-AGI-2 - more than double Gemini 3 Pro's score and ahead of Claude Opus 4.6. It's rolling out to the Gemini app, NotebookLM, and developer APIs including Gemini CLI and Antigravity.

Google Gemini 3.5 Flash Beats Larger Models on Agentic Benchmark

Google AI StudioMay 22

Google Gemini 3.5 Flash Beats Larger Models on Agentic Benchmark

Gemini 3.5 Flash has ranked first on the APEX-Agents-AA benchmark, outperforming larger frontier models in autonomous task execution. The result confirms that high-speed, low-cost models are now capable of handling complex agentic workflows previously reserved for larger architectures.

Arena.ai Ranks Google Gemini 3.5 Flash in Top Ten for Coding

ArenaMay 19

Arena.ai Ranks Google Gemini 3.5 Flash in Top Ten for Coding

Gemini 3.5 Flash has entered the Arena.ai leaderboards with a ninth-place ranking in both the overall Text and Frontend Coding categories. The model establishes a new price-performance frontier by delivering a 70-point jump in coding capability over its predecessor.