Long video generation is a systems problem. Introducing LongLive-2.0 from NVIDIA Research: an end-to-end NVFP4 training and inference system for long video generation. Low-precision deployment often relies on post-training quantization, creating a gap between how models are trained and how they run. LongLive-2.0 aligns NVFP4-aware training, distillation, and W4A4 inference, maintaining strong benchmark quality while improving speed and memory efficiency.
NVIDIA Launches LongLive-2.0 for 4-Bit Long Video Generation Infrastructure
NVFP4) precision to the entire long-video generation workflow. Unlike standard methods that quantize models after training, this system uses 4-bit-aware training and distillation to ensure the model is optimized for low-precision deployment from the start.- Inference speed
- 45.7 FPS (GB200)
- Training speedup
- 2.1x over BF16
- Peak memory
- 19.4GB (NVFP4 KV cache)
- Resolution
- 720p
- Availability
- GitHub (Code, Models, Paper)
Long video generation is a systems challenge because memory and compute requirements scale sharply with duration. LongLive-2.0 addresses this by implementing W4A4 inference and an NVFP4 KV cache, reducing peak memory to 19.4GB. This efficiency allows NVIDIA's SANA-WM world model to run on Blackwell hardware at 45.7 frames per second.
You can generate 720p video with consistent subjects across multiple shots using a new multi-shot attention sink. The framework supports prompt switching at chunk boundaries, making it suitable for complex, minute-scale storytelling. NVIDIA has released the full project, including the research paper, code, and pre-trained models, on GitHub for immediate implementation.
Still wondering? A few quick answers below.


