Qwen3.5-397B-A17B FP8 Open Weights Now Available for Self-Hosting

Qwen

Feb 18, 2026 · Updated Apr 25, 2026

Alibaba open-sourced Qwen3.5-397B-A17B-FP8 weights - a Mixture-of-Experts model activating only 17B of 397B parameters per token, matching frontier-model performance at a fraction of compute cost. SGLang support is merged; vLLM lands in days, making it self-hostable on standard inference infrastructure.

Qwen3.5-397B-A17B, Alibaba's latest open-weight model, combines a sparse Mixture-of-Experts architecture with Gated Delta Networks - a hybrid linear attention design replacing standard transformer attention throughout most layers. With 512 experts and only 10+1 activated per forward pass, it runs at 17B-parameter compute cost while matching the reasoning depth of a dense frontier model. Native context is 262K tokens, extensible to 1M.

Benchmarks show it competitive with GPT-5.2, Claude 4.5 Opus, and Gemini-3 Pro across knowledge, coding, agents, and multilingual tasks - covering 201 languages. The hosted version, Qwen3.5-Plus on Alibaba Cloud, ships with 1M context and built-in tool use. The open FP8 release puts that capability in your own infrastructure.

SGLang support is merged now; vLLM support arrives in the next few days. Download weights from Hugging Face or ModelScope, check the model card for example inference code, and start experimenting.

View the full update on huggingface.co

Qwen

@Alibaba_QwenFeb 18

🚀 Qwen3.5-397B-A17B-FP8 weights are now open! It took some time to adapt the inference frameworks, but here we are: ✅ SGLang support is merged 🔄 vLLM PR submitted → https://t.co/bxbqLOENTi Check the model card for example code. vLLM support landing in the next couple of days! Hugging Face: https://t.co/porwqHuKZR ModelScope: https://t.co/E7Lr8hiyWK

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Qwen →

Keep reading

Alibaba Open Sources Qwen3.6-35B-A3B Rivaling Frontier Models With 3B Active Parameters

Alibaba open-sourced Qwen3.6-35B-A3B, a sparse Mixture-of-Experts model with 35 billion total parameters that only activates 3 billion during inference. Despite its small active size, it matches frontier models in agentic coding and multimodal reasoning while operating under a permissive Apache 2.0 license.

vLLM Adds Day-0 Support for Alibaba Qwen3.6-27B Dense Model

vLLMApr 24

vLLM Adds Day-0 Support for Alibaba Qwen3.6-27B Dense Model

vLLM now supports Qwen3.6-27B, the flagship dense model of Alibaba's latest series, on the day of its release. This integration allows developers to immediately serve the model with high throughput using a dedicated inference recipe.

LMSYS Org Adds Day Zero SGLang Support for Qwen3.6-27B Reasoning Model

LMSYS OrgApr 24

LMSYS Org Adds Day Zero SGLang Support for Qwen3.6-27B Reasoning Model

LMSYS Org integrated immediate support for Qwen3.6-27B into its SGLang inference framework, enabling high-speed serving of the new 27-billion parameter model. The model outperforms the massive Qwen3.5-397B-A17B on coding benchmarks and introduces native thinking modes for complex reasoning.

Fireworks AI Adds Managed Fine-Tuning for Qwen 3.6 27B

Fireworks AIMay 15

Fireworks AI Adds Managed Fine-Tuning for Qwen 3.6 27B

Fireworks AI launched managed fine-tuning for Alibaba's Qwen 3.6 27B model, supporting 256K context windows and out-of-the-box DPO. This allows developers to specialize a high-performance dense model for complex coding and reasoning tasks on a production-ready stack.