Tencent Hunyuan Open-Sources PlanningBench to Advance LLM Planning Capabilities

Tencent HunyuanTencent Hunyuan

Tencent Hunyuan, in collaboration with Renmin University of China, open-sourced PlanningBench, a framework for evaluating and training large language model (LLM) planning capabilities. This release aims to help LLMs move from generating text to autonomously executing multi-step actions in complex scenarios.

Tencent Hunyuan, in collaboration with the Gaoling School of Artificial Intelligence at Renmin University of China, has open-sourced PlanningBench, a framework with over 30 real-world planning tasks for evaluating and training large language models (LLMs).
Evaluation features
Automated verification, adaptive difficulty control, instance-level verification checklists
Training support
Reinforcement learning on verified data
Task domains
Scheduling & Timetabling (28.42%), Project & Production Operations (21.90%), Routing & Traveling (17.21%), and 3 more

Planning is a fundamental capability for AI agents, enabling them to coordinate goals, constraints, and resources to achieve complex objectives. PlanningBench's evaluations show that current frontier models still struggle to produce complete solutions under coupled constraints, a challenge also observed in benchmarks like ITBench-AA for agentic tasks.

PlanningBench provides a controllable source of planning data for diagnosing and improving generalizable planning abilities in LLMs. The framework is available on GitHub and Hugging Face. This advances the development of more capable AI agents, a focus also seen in platforms like Arena.ai's Agent Mode.

PlanningBench taxonomy categorizes diverse real-world planning tasks across five domains, visualized by their respective task distribution percentages.
Tencent Hy
Tencent Hy
@TencentHunyuan
X

Planning is where LLMs move from “saying” to “doing.” Tencent Hy, in collaboration with the Gaoling School of Artificial Intelligence at Renmin University of China, is excited to open-source PlanningBench - a scalable, verifiable framework for evaluating and training LLM planning capabilities. With PlanningBench, you get: ✅ 30+ real-world planning tasks ✅ Automated verification ✅ Evaluation and training support See how top-tier LLMs perform on PlanningBench 👇 Resources: arXiv: https://t.co/N5xTRdo9KR GitHub: https://t.co/XftHZrKGyB HuggingFace: https://t.co/nBbddXnEDx #PlanningBench #TencentHunyuan #OpenSource 📷

16retweets79likes
View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update