š„ Qwen 3.5 Series GPTQ-Int4 weights are live. Native vLLM & SGLang support. ā”ļø Less VRAM. Faster inference. Run powerful models on limited-GPU setups. š Grab the weights + example code: Hugging Face: https://t.co/3MSb7miq68 ModelScope: https://t.co/LGHruBHP6Q
Qwen 3.5 Series Releases GPTQ-Int4 Weights for Limited-GPU Inference
Ā· Updated
Qwen 3.5 Series now has GPTQ-Int4 quantized weights available ā covering models from
35B to 397B parameters. The weights ship with native support for vLLM and SGLang, two popular open-source inference engines, meaning no additional setup is required to run them. GPTQ-Int4 packs model weights into 4-bit integers, significantly reducing the VRAM footprint without requiring custom serving infrastructure.This brings the larger Qwen 3.5 models within reach of teams that don't have access to high-end GPU clusters. Running a 35B parameter model, for instance, becomes feasible on hardware that previously couldn't accommodate it ā expanding who can self-host frontier-class open-weight models.
Grab the weights from Hugging Face or ModelScope. Both vLLM and SGLang users can load the GPTQ-Int4 checkpoints with the same configuration they use for standard models.
Qwen
@Alibaba_Qwen
83retweets
View on X



