NVIDIA Adds Day-One Support for Google DeepMind's DiffusionGemma Model

NVIDIANVIDIA

NVIDIA announced day-one support for Google DeepMind's DiffusionGemma, an experimental model that generates 256 tokens in parallel per step. BF16 and NVFP4 checkpoints are available on Hugging Face, alongside free GPU-accelerated endpoints and vLLM deployment. The model delivers over 150 tokens per second on DGX Spark and up to 1,000 on a single H100 GPU.

NVIDIA AI
NVIDIA AI
@NVIDIAAI
X

Congrats to @GoogleDeepMind on the launch of DiffusionGemma. The model generates 256 tokens in parallel per step, delivering 150+ TPS on DGX Spark, and 1,000+ TPS on a single H100. We're supporting it from day one with: • BF16 and NVFP4 checkpoints on @huggingface🤗 • Free GPU-accelerated endpoints on https://t.co/6T0R9P7EXS • @vllm_project support with FP8 precision Get started with DiffusionGemma on NVIDIA: https://t.co/vurk7GCQUs

118retweets1.4klikes
View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update