Tencent Hunyuan Open-Sources PlanningBench to Advance LLM Planning Capabilities

Tencent Hunyuan

Jun 7, 2026 · Updated Jun 13, 2026

Tencent Hunyuan, in collaboration with Renmin University of China, open-sourced PlanningBench, a framework for evaluating and training large language model (LLM) planning capabilities. This release aims to help LLMs move from generating text to autonomously executing multi-step actions in complex scenarios.

Tencent Hunyuan, in collaboration with the Gaoling School of Artificial Intelligence at Renmin University of China, has open-sourced PlanningBench, a framework with over 30 real-world planning tasks for evaluating and training large language models (LLMs).

Evaluation features: Automated verification, adaptive difficulty control, instance-level verification checklists
Training support: Reinforcement learning on verified data
Task domains: Scheduling & Timetabling (28.42%), Project & Production Operations (21.90%), Routing & Traveling (17.21%), and 3 more

Planning is a fundamental capability for AI agents, enabling them to coordinate goals, constraints, and resources to achieve complex objectives. PlanningBench's evaluations show that current frontier models still struggle to produce complete solutions under coupled constraints, a challenge also observed in benchmarks like ITBench-AA for agentic tasks.

PlanningBench provides a controllable source of planning data for diagnosing and improving generalizable planning abilities in LLMs. The framework is available on GitHub and Hugging Face. This advances the development of more capable AI agents, a focus also seen in platforms like Arena.ai's Agent Mode.

View the full update on arxiv.org

Tencent Hy

@TencentHunyuanJun 5

Planning is where LLMs move from “saying” to “doing.” Tencent Hy, in collaboration with the Gaoling School of Artificial Intelligence at Renmin University of China, is excited to open-source PlanningBench - a scalable, verifiable framework for evaluating and training LLM planning capabilities. With PlanningBench, you get: ✅ 30+ real-world planning tasks ✅ Automated verification ✅ Evaluation and training support See how top-tier LLMs perform on PlanningBench 👇 Resources: arXiv: https://t.co/N5xTRdo9KR GitHub: https://t.co/XftHZrKGyB HuggingFace: https://t.co/nBbddXnEDx #PlanningBench #TencentHunyuan #OpenSource 📷

1679

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Tencent Hunyuan →

Keep reading

Tencent Hunyuan's Hy-Memory Gives Agents Evolving Long-Term Understanding

Tencent Hunyuan has officially released Hy-Memory, a memory plugin designed for long-term collaborative AI agents. It uses a 6-layer memory framework and dual System1/System2 processing to enable agents to remember durably and efficiently, reducing memory count by over 70% and token usage by 35% on ultra-long contexts. This aims to move agents beyond single-session context, allowing them to build a persistent, evolving understanding of user preferences and intentions.

Arena.ai Adds Tencent Hy3 Preview for Public Reasoning and Code Benchmarking

ArenaApr 30

Arena.ai Adds Tencent Hy3 Preview for Public Reasoning and Code Benchmarking

Arena.ai has added Tencent's Hy3 preview model to its Text and Code Arena leaderboards for public evaluation. This move subjects the 295B-parameter model to blind human testing, providing a verified performance rank against proprietary frontier models.

OpenRouter Now Hosts Tencent Hy3-Preview for Free Agentic Reasoning

OpenRouterApr 24

OpenRouter Now Hosts Tencent Hy3-Preview for Free Agentic Reasoning

OpenRouter is now hosting Tencent's new Hy3-preview model, offering free access to the 295B-parameter Mixture-of-Experts model. This integration allows developers to test frontier-level reasoning and coding capabilities with a 256K context window at no cost.

Alibaba Launches Qwen3.7-Max for Long-Horizon Autonomous Agent Tasks

QwenMay 21

Alibaba Launches Qwen3.7-Max for Long-Horizon Autonomous Agent Tasks

Alibaba released Qwen3.7-Max, a flagship model optimized for autonomous agents capable of executing multi-step tasks over dozens of hours. The model features native support for the Model Context Protocol and demonstrated a tenfold performance increase in self-directed kernel optimization.