Arena.ai Confirms GPT-5.5 Naturally Uses Goblin and Gremlin Terms Without Restrictions

Arena

Apr 30, 2026 · Updated May 11, 2026

Arena.ai's analysis of GPT-5.5 reveals the model naturally generates terms like goblin and gremlin at a significantly higher rate than previous versions. This confirms that the model's creature obsession is an inherent behavioral trait rather than a result of specific user prompting.

Arena.ai, a community-driven AI model evaluation platform, found that GPT-5.5 naturally generates terms like "goblin" and "gremlin" at an elevated frequency. By testing the model without the restrictive system instructions found in OpenAI's first-party tools, they observed the model's raw linguistic tendencies "running free."

This finding highlights a gap between a model's trained weights and its intended persona. While OpenAI reportedly implemented specific prohibitions against these terms, the underlying behavior persists. This mirrors Owain Evans' research on hidden traits, suggesting that sanitizing output requires aggressive post-training.

For those building on GPT-5.5, this analysis shows that frontier models can develop arbitrary linguistic quirks. This behavioral study follows GPT-5.5's leaderboard entry and builds on Xiaomi's MiMo-V2.5-Pro validation. GPT-5.5 is currently live in the Arena for community evaluation.

View the full update on x.com

Arena.ai

@arenaApr 28

It's true. Here's a plot of GPT models and their usage of "goblin", "gremlin", "troll", etc over time. There's no anti-gremlin system instruction on our side, we get to see GPT-5.5 run free. https://t.co/Z8F6mTtJSS

611.2k

View on X

Still wondering? A few quick answers below.

Analysis from Arena.ai confirms that GPT-5.5 naturally generates terms like goblin, gremlin, and troll at a higher frequency than previous models. This behavior is an inherent linguistic trait of the model's trained weights. While OpenAI attempts to suppress these words in specific deployments, the raw model continues to produce them when running without restrictive system instructions.

System instructions are hidden rules that define how an AI model should behave. For GPT-5.5, OpenAI reportedly uses specific instructions in tools like Codex to prevent the model from mentioning goblins, gremlins, ogres, or certain animals. These filters are designed to sanitize the model's output and maintain a professional persona, though the underlying model still favors these terms.

Arena.ai is a community-driven platform that measures AI model performance through community-driven evaluation. Unlike OpenAI's first-party applications, Arena.ai does not apply the restrictive system instructions that normally filter GPT-5.5's output. This allows researchers to observe the model's natural behavior and linguistic tendencies, providing a clearer picture of its raw capabilities and unprompted personality traits.

Yes, GPT-5.5 is currently live on the Arena.ai platform across multiple leaderboards, including those for code and text. Users can interact with the model to evaluate its performance and compare it against other frontier models. Because Arena.ai provides access to the model without OpenAI's standard behavioral filters, it serves as a primary source for observing raw model behavior.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Arena →

Keep reading

OpenAI Explains Why GPT-5 Models Became Obsessed With Goblins

OpenAI published a technical post-mortem tracing the goblin behavioral quirk in GPT-5 models to unintended reinforcement during personality training. The investigation reveals how a specific reward signal for a playful persona leaked into the base model behavior, creating a persistent feedback loop.

Arena.ai Ranks GPT-5.5 as Top Tier for Search and Coding

ArenaApr 30

Arena.ai Ranks GPT-5.5 as Top Tier for Search and Coding

GPT-5.5 entered the Arena.ai leaderboards with a top-two ranking in search and a 50-point performance jump in agentic web development. These community-driven results validate the model's focus on complex tool use and reasoning across vision, math, and document analysis.

Lovable Reports GPT-5.5 Gains in Efficiency and Roadblock Resolution

LovableApr 24

Lovable Reports GPT-5.5 Gains in Efficiency and Roadblock Resolution

Lovable's early testing of GPT-5.5 shows the model requires 23.1% fewer tool calls while improving performance on complex technical builds. These results demonstrate a measurable leap in agentic reasoning, allowing AI to navigate difficult coding tasks with fewer errors at the same cost as previous models.

Why does GPT-5.5 use words like goblin and gremlin?

What are the GPT-5.5 anti-gremlin system instructions?

How is GPT-5.5 tested on Arena.ai?

Is GPT-5.5 available for testing on Arena.ai?

Keep reading

OpenAI Explains Why GPT-5 Models Became Obsessed With Goblins

OpenAI Explains Why GPT-5 Models Became Obsessed With Goblins

Arena.ai Ranks GPT-5.5 as Top Tier for Search and Coding

Arena.ai Ranks GPT-5.5 as Top Tier for Search and Coding

Lovable Reports GPT-5.5 Gains in Efficiency and Roadblock Resolution

Lovable Reports GPT-5.5 Gains in Efficiency and Roadblock Resolution

Keep reading

OpenAI Explains Why GPT-5 Models Became Obsessed With Goblins

OpenAI Explains Why GPT-5 Models Became Obsessed With Goblins

Arena.ai Ranks GPT-5.5 as Top Tier for Search and Coding

Arena.ai Ranks GPT-5.5 as Top Tier for Search and Coding

Lovable Reports GPT-5.5 Gains in Efficiency and Roadblock Resolution

Lovable Reports GPT-5.5 Gains in Efficiency and Roadblock Resolution