Nebius Sets New Inference Performance Standards for Blackwell and Blackwell Ultra

Nebius

Apr 1, 2026 · Updated Apr 25, 2026

Nebius secured 10 first-place finishes in the MLPerf Inference v6.0 benchmarks using the latest NVIDIA Blackwell and Blackwell Ultra systems. These results demonstrate linear performance scaling for frontier models like DeepSeek R1, providing a verified blueprint for high-throughput production AI infrastructure.

Nebius submitted results for the MLPerf Inference v6.0 suite, testing the latest NVIDIA Blackwell and Blackwell Ultra architectures. The submission covered the HGX B200, HGX B300, and the rack-scale GB300 NVL72 system. Benchmarks focused on frontier-scale models including DeepSeek R1, the multimodal Qwen3-VL 235B, and gpt-oss 120B.

As the industry shifts toward reasoning models and multimodal systems, raw hardware access is no longer enough. These results prove that Nebius has optimized its networking and software stack to extract maximum throughput from NVIDIA's newest silicon. The GB300 NVL72 configuration demonstrated linear scaling across 72 GPUs, ensuring performance does not degrade at scale.

You can now leverage these optimized configurations for latency-sensitive applications like agentic loops. Nebius provides these capabilities through its AI cloud platform and the Nebius Token Factory. If you are scaling production deployments of DeepSeek R1, these benchmarks offer a verified baseline for expected tokens-per-second performance.

View the full update on nebius.com

Nebius

@nebiusaiApr 1

Announcing the results of our MLPerf® Inference v6.0, a peer-reviewed industry benchmark suite for AI cloud performance. The results of our submission demonstrate Nebius’ ability to maximize efficiency for modern AI inference workloads on the latest NVIDIA Blackwell and Blackwell Ultra platforms. From single-node systems to the full-rack GB300 NVL72 configuration, these benchmarks highlight the capability of our global infrastructure to support demanding large language and multimodal models. Learn more: https://t.co/3cGkbSnKjA

9128

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Keep reading

Perplexity Benchmarks Blackwell Performance for High Throughput Large Model Inference

Perplexity published research showing that NVIDIA's GB200 Blackwell architecture nearly halves communication latency for large Mixture-of-Experts models compared to the previous generation. The findings suggest that Blackwell is a primary platform for reducing the cost and latency of serving frontier-scale AI search.

NVIDIA Blackwell Ultra Powers DeepSeek V4 Pro at 150 Tokens Per Second

NVIDIAApr 25

NVIDIA Blackwell Ultra Powers DeepSeek V4 Pro at 150 Tokens Per Second

NVIDIA reported that DeepSeek-V4-Pro achieves over 150 tokens per second on Blackwell Ultra hardware. This performance level makes 1.6-trillion parameter models viable for real-time autonomous agents. Future software updates like Dynamo and NVFP4 are expected to push these speeds even higher.

Qwen Sets 580 TPS Record for Agentic Workloads on Blackwell GPUs

QwenMay 27

Qwen Sets 580 TPS Record for Agentic Workloads on Blackwell GPUs

Qwen achieved a record 580 tokens per second running its Qwen3.5-397B-A17B model on NVIDIA Blackwell GPUs using the TokenSpeed inference engine. The optimization targets agentic workloads, where multi-turn reasoning and tool-calling typically suffer from high latency. By combining a hybrid attention architecture with deep kernel fusion, the system maintains high throughput even as context scales to one million tokens.

NVIDIA Nemotron 3 Ultra Claims Top US Open Weights Intelligence Spot

Artificial AnalysisJun 1

NVIDIA Nemotron 3 Ultra Claims Top US Open Weights Intelligence Spot

NVIDIA released Nemotron 3 Ultra, a 550B-parameter model that leads US open-weights benchmarks with an intelligence score of 48. The model delivers high-throughput performance exceeding 300 tokens per second, significantly outpacing similarly sized frontier models from China.