Qwen-Image-2.0-Pro is now live 🚀🚀 We’ve pushed image quality, multilingual text rendering, and instruction following to a new level, while making performance much more consistent across styles.🌅🌃 Ranked #9 worldwide for Text-to-Image on @arena 🔗Try it now on ModelScope: https://t.co/pPtrbjzzBK https://t.co/raB6WWMEMP API:https://t.co/EgYS5qt2bF
Alibaba Qwen Launches Qwen-Image-2.0-Pro for Professional Infographics and 2K Design
Alibaba, the technology group behind the Qwen series of open-source models, launched Qwen-Image-2.0-Pro — a unified "omni" model that combines image generation and editing. It uses an 8B encoder and a 7B diffusion decoder (the component that turns mathematical representations into pixels) to support native 2K resolution.
- Architecture
- Unified generation and editing
- Decoder parameters
- 7B
- Encoder parameters
- 8B
- Native resolution
- 2048x2048
- Instruction limit
- 1K tokens
- Arena ranking
- #9 Text-to-Image
- Availability
- API, ModelScope
This release mirrors the shift toward functional design seen in OpenAI's ChatGPT Images 2.0. While most models struggle with long prompts, this version supports 1K-token instructions to render complex assets like multi-panel comics and infographics. It currently ranks #9 globally on the Text-to-Image Arena leaderboard.
You can use the model for workflows requiring precise typography, such as generating PPT slides or posters with structured grids. It is available via API on Alibaba Cloud's Model Studio and for testing on ModelScope. The unified architecture allows you to edit photos by adding text without switching models.
Qwen
@Alibaba_Qwen
6retweets52likes
View on XStill wondering? A few quick answers below.
Qwen-Image-2.0-Pro is a foundational image model from Alibaba designed for high-fidelity image generation and editing. It uses a 7B-parameter diffusion decoder and an 8B-parameter encoder to produce native 2K resolution images. The model is specifically optimized for professional design tasks, including the creation of infographics, posters, and complex visual layouts with accurate text.
The model features a professional typography engine that supports instructions up to 1,000 tokens in length. This allows it to render large volumes of text across different styles, including calligraphy and small regular scripts. It maintains high accuracy in multilingual text rendering and can precisely align text within structured elements like calendars, tables, and comic speech bubbles.
Yes, Qwen-Image-2.0-Pro is a unified omni model that merges previously separate generation and editing tracks into a single architecture. This integration allows the model to perform text-to-image generation and precise image editing, such as adding text to existing photos or merging subjects from different images, without needing to switch between different specialized pipelines or models.
The model supports native 2K resolution, producing images at 2048 by 2048 pixels with high detail in textures like skin pores and fabric. It can process complex instructions up to 1,000 tokens, enabling the generation of detailed infographics and multi-panel comics. Despite its high performance, it uses a relatively compact 7B-parameter architecture to ensure faster inference speeds.
You can access Qwen-Image-2.0-Pro through the Alibaba Cloud Model Studio API or test it via the ModelScope platform. It is also available for interactive use on the Qwen Studio website. The model is designed to be efficient enough for production use, offering a balance between visual fidelity and the speed required for real-time design and editing workflows.





