🔥 Qwen 3.5 Series GPTQ-Int4 weights are live. Native vLLM & SGLang support. ⚡️ Less VRAM. Faster inference. Run powerful models on limited-GPU setups. 👇 Grab the weights + example code: Hugging Face: https://t.co/3MSb7miq68 ModelScope: https://t.co/LGHruBHP6Q
Qwen 3.5 Series Releases GPTQ-Int4 Weights for Limited-GPU Inference
· Updated
Qwen 3.5 Series, Alibaba's open-weight model family, now has GPTQ-Int4 quantized weights available for its larger model sizes. They work natively with vLLM and SGLang, cutting VRAM needs so teams can run larger Qwen 3.5 models on constrained GPU setups.
35B to 397B parameters. The weights ship with native support for vLLM and SGLang, two popular open-source inference engines, meaning no additional setup is required to run them. GPTQ-Int4 packs model weights into 4-bit integers, significantly reducing the VRAM footprint without requiring custom serving infrastructure.This brings the larger Qwen 3.5 models within reach of teams that don't have access to high-end GPU clusters. Running a 35B parameter model, for instance, becomes feasible on hardware that previously couldn't accommodate it — expanding who can self-host frontier-class open-weight models.
Grab the weights from Hugging Face or ModelScope. Both vLLM and SGLang users can load the GPTQ-Int4 checkpoints with the same configuration they use for standard models.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →




