no benchmark will tell you this: LLMs can be /too/ nice unsurprisingly, in a competitive zero-sum setting, being nice can be bad i built royale: last agent standing, a br for agents, and ran it 30 times the nicest model lost hard. the model you least expected, won 🧵: https://t.co/lEFpfqnIdJ
OpenRouter Agent Battle Royale Reveals Alignment Tax Impacts Performance
OpenRouter ran 11 LLMs through 30 battle royale games to test agent performance in zero-sum settings. Grok 4.1 Fast won 13 games at 0.97 dollars per win, while Claude Sonnet 4.6 struggled by prioritizing cooperation. The results show that alignment training, designed for safety, can act as a performance tax in competitive tasks where ruthlessness is required.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →





