Tencent Releases Hy3 Preview MoE Model With Controllable Reasoning Effort

Hy3-preview—a massive Mixture-of-Experts (a sparse architecture activating only specialized sub-networks) model with 295 billion total parameters. It activates 21 billion parameters during inference, utilizing a 3.8 billion parameter Multi-Token Prediction layer to accelerate training and reasoning efficiency.This release mirrors the industry shift toward inference-time compute scaling, following a trend from AntLingAGI's high-reasoning models. By allowing users to control reasoning effort, the model can either provide direct answers or engage in deep thinking. This flexibility positions it as a cost-effective alternative to monolithic frontier models.
Access the weights on Hugging Face or use it for free via OpenRouter, which recently extended free access for other agentic models. It is optimized for agentic workflows, scoring competitively on coding benchmarks. Developers can deploy it locally using vLLM or SGLang with recommended settings for high-reasoning tasks.
Frequently asked questions
- What is Tencent Hy3-preview?
- Hy3-preview is a 295 billion parameter Mixture-of-Experts model developed by the Tencent Hy Team. It is designed for high-efficiency reasoning, activating only 21 billion parameters during inference. The model features a 256,000-token context window and is optimized for complex tasks like coding agents, search, and instruction following.
- How does the controllable reasoning effort work in Hy3-preview?
- The model supports a configurable reasoning effort parameter that allows users to toggle between different thinking modes. By setting the reasoning effort to high, the model performs deep chain-of-thought reasoning for complex math or coding tasks. Alternatively, users can select a no-think mode for direct, faster responses in standard conversational scenarios.
- Is Tencent Hy3-preview open source?
- Yes, Tencent has released the model weights for Hy3-preview under the Tencent Hy Community License Agreement. The weights are publicly available for download on platforms like Hugging Face, ModelScope, and GitCode. This allows developers to run the model locally or fine-tune it for specific enterprise or research applications.
- How do I deploy and run Hy3-preview?
- Developers can deploy Hy3-preview using inference frameworks like vLLM or SGLang. Because the model has 295 billion total parameters, Tencent recommends using high-memory hardware like H20-3e GPUs in an eight-GPU configuration. The model supports Multi-Token Prediction to improve performance and can be accessed via an OpenAI-compatible API once the server is running.
- How does Hy3-preview perform on benchmarks?
- Hy3-preview shows strong performance in reasoning and coding, achieving a 74.4 percent score on SWE-bench Verified. In pre-training benchmarks, it outperforms models like DeepSeek-V3 and GLM-4.5 in math and coding tasks despite having fewer active parameters. It also demonstrates high generalizable reasoning capacity on challenging STEM exams and Olympiad-level benchmarks.



