DeepSeek V4 Pro on Together AI is now #1 on Artificial Analysis for both output speed and latency. Serving V4 well is an inference systems problem: KV cache, prefix reuse, kernels, and endpoint profiles. We break down the systems work here: https://t.co/RLHi35DFif
Together AI DeepSeek V4 Pro Deployment Tops Industry Speed Benchmarks
Together AITogether AI now ranks first on Artificial Analysis for DeepSeek V4 Pro inference, delivering 211.9 tokens per second. This performance lead across 11 providers stems from inference systems optimizations, including custom KV cache management, prefix reuse, and kernel tuning on NVIDIA HGX B200 hardware. The deployment achieves the lowest latency and highest output speed for the model.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →





