Three new open-source models just landed in ComfyUI natively: → Gemma 4 (Google DeepMind) - multimodal LLM handling text, image, audio, and video input with built-in step-by-step reasoning mode → VOID (Netflix) - video object removal that also erases shadows, reflections, and https://t.co/K1cTS7ECCg
ComfyUI Integrates Google Gemma 4 and Netflix VOID for Multimodal Video Workflows
· Updated
ComfyUI natively integrated Google's Gemma 4 multimodal model and Netflix's VOID video inpainting tool into its node-based orchestration platform. This update allows users to combine frontier-class reasoning with professional-grade video object removal that erases complex artifacts like shadows and reflections. By bringing these open-source models into a visual workflow, creators can now build automated pipelines for sophisticated media editing and analysis.
- Gemma 4 inputs
- Text, image, audio, and video
- VOID capabilities
- Video object, shadow, and reflection removal
- BiRefNet capability
- Background removal
- Availability
- Native ComfyUI nodes and cloud templates
- Gemma 4 feature
- Built-in step-by-step reasoning mode
This integration shifts ComfyUI from image generation into a hub for multimodal reasoning. While VOID provides professional-grade video object removal that handles shadows and reflections, Gemma 4 allows the system to analyze content. This mirrors the industry-wide move toward high-performance agentic workflows that combine reasoning with specialized media tools.
You can now deploy these models through cloud templates or local nodes to automate complex editing. Gemma 4 can analyze video frames to guide generation, while VOID enables seamless object erasure. These tools are available as open-source integrations within the ComfyUI interface for local and cloud-based execution.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →




