Together AI Delivers Real-Time Blackwell Inference Infrastructure for Cursor Agents

Together AITogether AI

Together AI built a real-time inference stack for Cursor’s in-editor coding agents using NVIDIA Blackwell GB200 NVL72 and B200 GPUs. The infrastructure features custom kernels for Blackwell Tensor Core instructions, ARM host optimization, and a quantization pipeline that moves internally trained model weights to production test endpoints within days, ensuring predictable latency for real-time code refactoring.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update