Together AI DeepSeek V4 Pro Deployment Tops Industry Speed Benchmarks

Together AITogether AI

Together AI now ranks first on Artificial Analysis for DeepSeek V4 Pro inference, delivering 211.9 tokens per second. This performance lead across 11 providers stems from inference systems optimizations, including custom KV cache management, prefix reuse, and kernel tuning on NVIDIA HGX B200 hardware. The deployment achieves the lowest latency and highest output speed for the model.

Together AI leads industry benchmarks for output speed and lowest latency across eleven major inference providers.
Together AI
Together AI
@togethercompute
X

DeepSeek V4 Pro on Together AI is now #1 on Artificial Analysis for both output speed and latency. Serving V4 well is an inference systems problem: KV cache, prefix reuse, kernels, and endpoint profiles. We break down the systems work here: https://t.co/RLHi35DFif

10likes
View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update