Congrats to @GoogleDeepMind on the launch of DiffusionGemma. The model generates 256 tokens in parallel per step, delivering 150+ TPS on DGX Spark, and 1,000+ TPS on a single H100. We're supporting it from day one with: • BF16 and NVFP4 checkpoints on @huggingface🤗 • Free GPU-accelerated endpoints on https://t.co/6T0R9P7EXS • @vllm_project support with FP8 precision Get started with DiffusionGemma on NVIDIA: https://t.co/vurk7GCQUs
NVIDIA Adds Day-One Support for Google DeepMind's DiffusionGemma Model
NVIDIANVIDIA announced day-one support for Google DeepMind's DiffusionGemma, an experimental model that generates 256 tokens in parallel per step. BF16 and NVFP4 checkpoints are available on Hugging Face, alongside free GPU-accelerated endpoints and vLLM deployment. The model delivers over 150 tokens per second on DGX Spark and up to 1,000 on a single H100 GPU.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →



