HeadsUpAI

Qwen 3.5 Series Releases GPTQ-Int4 Weights for Limited-GPU Inference

Ā· Updated

Qwen 3.5 Series now has GPTQ-Int4 quantized weights available — covering models from 35B to 397B parameters. The weights ship with native support for vLLM and SGLang, two popular open-source inference engines, meaning no additional setup is required to run them. GPTQ-Int4 packs model weights into 4-bit integers, significantly reducing the VRAM footprint without requiring custom serving infrastructure.

This brings the larger Qwen 3.5 models within reach of teams that don't have access to high-end GPU clusters. Running a 35B parameter model, for instance, becomes feasible on hardware that previously couldn't accommodate it — expanding who can self-host frontier-class open-weight models.

Grab the weights from Hugging Face or ModelScope. Both vLLM and SGLang users can load the GPTQ-Int4 checkpoints with the same configuration they use for standard models.

Qwen
Qwen
@Alibaba_Qwen
X

šŸ”„ Qwen 3.5 Series GPTQ-Int4 weights are live. Native vLLM & SGLang support. āš”ļø Less VRAM. Faster inference. Run powerful models on limited-GPU setups. šŸ‘‡ Grab the weights + example code: Hugging Face: https://t.co/3MSb7miq68 ModelScope: https://t.co/LGHruBHP6Q

83retweets
View on X

Share this update