Announcing the results of our MLPerf® Inference v6.0, a peer-reviewed industry benchmark suite for AI cloud performance. The results of our submission demonstrate Nebius’ ability to maximize efficiency for modern AI inference workloads on the latest NVIDIA Blackwell and Blackwell Ultra platforms. From single-node systems to the full-rack GB300 NVL72 configuration, these benchmarks highlight the capability of our global infrastructure to support demanding large language and multimodal models. Learn more: https://t.co/3cGkbSnKjA
Nebius Sets New Inference Performance Standards for Blackwell and Blackwell Ultra
· Updated
Nebius submitted results for the MLPerf Inference v6.0 suite, testing the latest NVIDIA Blackwell and Blackwell Ultra architectures. The submission covered the
HGX B200, HGX B300, and the rack-scale GB300 NVL72 system. Benchmarks focused on frontier-scale models including DeepSeek R1, the multimodal Qwen3-VL 235B, and gpt-oss 120B.As the industry shifts toward reasoning models and multimodal systems, raw hardware access is no longer enough. These results prove that Nebius has optimized its networking and software stack to extract maximum throughput from NVIDIA's newest silicon. The GB300 NVL72 configuration demonstrated linear scaling across 72 GPUs, ensuring performance does not degrade at scale.
You can now leverage these optimized configurations for latency-sensitive applications like agentic loops. Nebius provides these capabilities through its AI cloud platform and the Nebius Token Factory. If you are scaling production deployments of DeepSeek R1, these benchmarks offer a verified baseline for expected tokens-per-second performance.
Nebius
@nebiusai
9retweets128likes
View on X




