š Qwen3.5-397B-A17B-FP8 weights are now open! It took some time to adapt the inference frameworks, but here we are: ā SGLang support is merged š vLLM PR submitted ā https://t.co/bxbqLOENTi Check the model card for example code. vLLM support landing in the next couple of days! Hugging Face: https://t.co/porwqHuKZR ModelScope: https://t.co/E7Lr8hiyWK
Qwen3.5-397B-A17B FP8 Open Weights Now Available for Self-Hosting
Ā· Updated
Alibaba open-sourced Qwen3.5-397B-A17B-FP8 weights - a Mixture-of-Experts model activating only 17B of 397B parameters per token, matching frontier-model performance at a fraction of compute cost. SGLang support is merged; vLLM lands in days, making it self-hostable on standard inference infrastructure.
Benchmarks show it competitive with GPT-5.2, Claude 4.5 Opus, and Gemini-3 Pro across knowledge, coding, agents, and multilingual tasks - covering 201 languages. The hosted version, Qwen3.5-Plus on Alibaba Cloud, ships with 1M context and built-in tool use. The open FP8 release puts that capability in your own infrastructure.
SGLang support is merged now; vLLM support arrives in the next few days. Download weights from Hugging Face or ModelScope, check the model card for example inference code, and start experimenting.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards ā




