Alibaba Qwen3.5-Omni Launches with Native Audio Visual Vibe Coding

Qwen

Mar 31, 2026 · Updated Apr 25, 2026

Alibaba Cloud released Qwen3.5-Omni, a native omni-modal foundation model that processes text, images, audio, and video within a single end-to-end architecture. This update introduces real-time interactive capabilities and a new feature called Audio-Visual Vibe Coding.

Alibaba Cloud's Tongyi Lab launched Qwen3.5-Omni, the latest generation of its foundation model series. This version is natively omni-modal, meaning it processes text, image, audio, and video inputs through a single, end-to-end architecture. It achieves state-of-the-art performance across all four modalities simultaneously.

The release marks a shift toward real-time, multi-sensory interaction. By integrating audio-visual understanding directly into the core model, Qwen3.5-Omni can perform complex tasks like automatic video segmentation and script generation that accounts for character relationships. It maintains high performance across all modalities while advancing real-time interaction.

You can now use Qwen3.5-Omni for workflows requiring fine-grained video analysis or real-time conversational agents. The standout Audio-Visual Vibe Coding feature introduces a multi-sensory approach to development. The model is available for online serving via the vLLM Python client, supporting query types for audio and video.

View the full update on qwen.ai

Qwen

@Alibaba_QwenMar 30

🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI. Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction. A standout feature: 'Audio-Visual Vibe Coding'. https://t.co/6YOpqOFxG1

4083.2k

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Qwen →

Keep reading

Alibaba Releases Qwen3.7-Plus to Automate Software Tasks via Unified Vision and Code

Alibaba released Qwen3.7-Plus, a multimodal foundation model that integrates vision and language reasoning into a single agent loop. The update enables AI agents to perceive graphical interfaces and execute command-line operations to automate complex software development and productivity workflows.

Alibaba Qwen3.7 Preview Enters Arena Top 15 for Text and Vision

ArenaMay 18

Alibaba Qwen3.7 Preview Enters Arena Top 15 for Text and Vision

Alibaba's Qwen3.7 Max and Plus preview models have debuted on the Arena.ai leaderboards, ranking #13 in text and #16 in vision. The results establish Alibaba as a top-six global AI lab with specific strengths in math, software engineering, and expert-level reasoning.

Fireworks AI Adds Qwen 3.5 Training to Build Custom Reasoning Agents

Fireworks AIApr 30

Fireworks AI Adds Qwen 3.5 Training to Build Custom Reasoning Agents

Fireworks AI integrated Alibaba's Qwen 3.5 into its training platform, supporting full-parameter fine-tuning and reinforcement learning with a 256K context window. This allows developers to customize the high-performance open-weight model for specialized reasoning and coding tasks on a unified stack.

vLLM Adds Day-0 Support for Alibaba Qwen3.6-27B Dense Model

vLLMApr 24

vLLM Adds Day-0 Support for Alibaba Qwen3.6-27B Dense Model

vLLM now supports Qwen3.6-27B, the flagship dense model of Alibaba's latest series, on the day of its release. This integration allows developers to immediately serve the model with high throughput using a dedicated inference recipe.