CursorBench is an internal evaluation suite used by the Cursor team to measure AI model quality. It uses real developer sessions and Cursor Blame to pair queries with ground-truth solutions. Unlike public benchmarks, it focuses on complex, multi-file tasks and monorepos navigation to better reflect how engineers actually use AI agents in production environments.

What is the pricing for GPT-5.5 in Cursor?

Through a direct partnership with OpenAI, Cursor is offering GPT-5.5 at a 50 percent discount for all users. This promotional pricing is available within the editor to encourage developers to test the model's improved coding persistence and tool-use reliability. The discounted rate is scheduled to remain in effect for a limited time, ending on May 2, 2026.

How does GPT-5.5 compare to other models on CursorBench?

GPT-5.5 currently holds the top position on the CursorBench-3 leaderboard with a correctness score of 72.8 percent. This internal benchmark is designed to be more difficult than public sets like SWE-bench, as it involves larger edit sizes and ambiguous developer requests that require multi-step reasoning, tool use, and autonomous navigation across multiple files.

Why does Cursor avoid using public benchmarks like SWE-bench?

Cursor avoids public benchmarks because they often suffer from data contamination, where tasks appear in model training data and inflate scores. Many public tests also focus on narrow bug-fixing rather than complex agent workflows. CursorBench uses internal, unseen code and agentic graders to provide a more accurate assessment of how models handle real-world engineering work.

Who can use GPT-5.5 in Cursor?

GPT-5.5 is now available to all Cursor users directly within the code editor. Users can select the model from the dropdown menu to use it for chat, codebase indexing, and autonomous agent tasks. The integration allows developers to leverage the model's frontier intelligence for complex coding workflows while benefiting from a temporary 50 percent price reduction through May 2.

Cursor Adds GPT-5.5 and Leads Internal Benchmarks With 50 Percent Discount

Agentic Coding

GPT

AI Economics

Benchmark

Performance

Published Apr 24, 2026

Cursor Adds GPT-5.5 and Leads Internal Benchmarks With 50 Percent Discount

Cursor, an AI-native code editor, integrated OpenAI's newly released GPT-5.5 model. It currently leads CursorBench-3, the company's internal evaluation suite, with a 72.8% correctness score. This follows the editor's previous integration of GPT-5.4 as its then-benchmark leader.

The update highlights limitations in public benchmarks like SWE-bench, which often suffer from data contamination. By using CursorBench methodology, the team uses agentic graders to measure performance on real-world, underspecified tasks. This provides a clearer distinction between frontier models that developers experience as meaningfully different.

You can now select GPT-5.5 within the Cursor editor. Through a partnership with OpenAI, the model is available at a 50% discount until May 2, 2026. This pricing applies to all users, making it cost-effective to test the model's improved persistence and tool-use reliability on long-running engineering experiments.

Read the full update →

Frequently asked questions

What is CursorBench?: CursorBench is an internal evaluation suite used by the Cursor team to measure AI model quality. It uses real developer sessions and Cursor Blame to pair queries with ground-truth solutions. Unlike public benchmarks, it focuses on complex, multi-file tasks and monorepos navigation to better reflect how engineers actually use AI agents in production environments.
What is the pricing for GPT-5.5 in Cursor?: Through a direct partnership with OpenAI, Cursor is offering GPT-5.5 at a 50 percent discount for all users. This promotional pricing is available within the editor to encourage developers to test the model's improved coding persistence and tool-use reliability. The discounted rate is scheduled to remain in effect for a limited time, ending on May 2, 2026.
How does GPT-5.5 compare to other models on CursorBench?: GPT-5.5 currently holds the top position on the CursorBench-3 leaderboard with a correctness score of 72.8 percent. This internal benchmark is designed to be more difficult than public sets like SWE-bench, as it involves larger edit sizes and ambiguous developer requests that require multi-step reasoning, tool use, and autonomous navigation across multiple files.
Why does Cursor avoid using public benchmarks like SWE-bench?: Cursor avoids public benchmarks because they often suffer from data contamination, where tasks appear in model training data and inflate scores. Many public tests also focus on narrow bug-fixing rather than complex agent workflows. CursorBench uses internal, unseen code and agentic graders to provide a more accurate assessment of how models handle real-world engineering work.
Who can use GPT-5.5 in Cursor?: GPT-5.5 is now available to all Cursor users directly within the code editor. Users can select the model from the dropdown menu to use it for chat, codebase indexing, and autonomous agent tasks. The integration allows developers to leverage the model's frontier intelligence for complex coding workflows while benefiting from a temporary 50 percent price reduction through May 2.

Frequently asked questions

Trending