HeadsUpAI

ComfyUI Integrates Google Gemma 4 and Netflix VOID for Multimodal Video Workflows

ComfyUI, a node-based AI orchestration platform, natively integrated three new open-source models. The update adds Google's Gemma 4, a multimodal LLM (a model processing text, images, and video) with a reasoning mode, alongside VOID from Netflix and the BiRefNet background removal utility.
Gemma 4 inputs
Text, image, audio, and video
VOID capabilities
Video object, shadow, and reflection removal
BiRefNet capability
Background removal
Availability
Native ComfyUI nodes and cloud templates
Gemma 4 feature
Built-in step-by-step reasoning mode

This integration shifts ComfyUI from image generation into a hub for multimodal reasoning. While VOID provides professional-grade video object removal that handles shadows and reflections, Gemma 4 allows the system to analyze content. This mirrors the industry-wide move toward high-performance agentic workflows that combine reasoning with specialized media tools.

You can now deploy these models through cloud templates or local nodes to automate complex editing. Gemma 4 can analyze video frames to guide generation, while VOID enables seamless object erasure. These tools are available as open-source integrations within the ComfyUI interface for local and cloud-based execution.

ComfyUI
ComfyUI
@ComfyUI
X

Three new open-source models just landed in ComfyUI natively: → Gemma 4 (Google DeepMind) - multimodal LLM handling text, image, audio, and video input with built-in step-by-step reasoning mode → VOID (Netflix) - video object removal that also erases shadows, reflections, and https://t.co/K1cTS7ECCg

63retweets594likes
View on X

Still wondering? A few quick answers below.

Google Gemma 4 is a multimodal large language model developed by Google that is now natively supported in ComfyUI. It can process text, image, audio, and video inputs simultaneously. A key feature is its built-in step-by-step reasoning mode, which allows the model to perform complex logical tasks and analysis within a visual workflow.

VOID is a specialized video inpainting model developed by Netflix that is now integrated into ComfyUI. Unlike standard object removal tools, VOID is designed to erase not just the target object but also the associated shadows and reflections it casts. This results in much cleaner video edits that maintain visual consistency and realism across frames.

Yes, the new models integrated into ComfyUI are open source. This includes Google's Gemma 4, Netflix's VOID video inpainting tool, and the BiRefNet background removal utility. Because they are open source, developers and creators can run them locally within their own ComfyUI environments or use official cloud templates for immediate testing and deployment.

BiRefNet is a specialized utility model integrated into ComfyUI for the purpose of removing backgrounds from images. It is designed to provide high-quality subject isolation, which is a critical step in many generative AI pipelines. Users can access this capability natively through ComfyUI nodes or by using the pre-configured cloud templates provided in the official announcement.

Share this update