OpenAI Explains Why GPT-5 Models Became Obsessed With Goblins

OpenAI

Apr 30, 2026 · Updated May 3, 2026

OpenAI published a technical post-mortem tracing the goblin behavioral quirk in GPT-5 models to unintended reinforcement during personality training. The investigation reveals how a specific reward signal for a playful persona leaked into the base model behavior, creating a persistent feedback loop.

OpenAI released a technical investigation into a behavioral quirk where the GPT-5.5 model series began excessively using metaphors involving goblins and gremlins. The team traced the root cause to the Nerdy personality feature, which unknowingly rewarded creature-based language during reinforcement learning (training that uses rewards to shape behavior).

This phenomenon demonstrates reward generalization, where incentives for a specific persona contaminate the baseline model's behavior. Because these lexical tics were highly rewarded, they appeared in model-generated data used for fine-tuning, creating a feedback loop that reinforced the habit even when the specific persona prompt was absent.

While OpenAI has implemented system-level instructions to suppress these mentions, you can still bypass these filters in the Codex engineering agent using a command-line override. This update serves as a case study for researchers on the risks of persona-driven training and the difficulty of maintaining lexical diversity.

View the full update on openai.com

OpenAI

@OpenAIApr 30

We’re talking about Goblins. https://t.co/dqmcLGCW71

8458.2k

View on X

Still wondering? A few quick answers below.

OpenAI traced the frequent mention of goblins and gremlins to the Nerdy personality customization feature. During training, the reward signal intended to encourage a playful tone unknowingly gave high scores to creature-based metaphors. This behavior generalized across the model, causing the lexical tic to appear even when the specific personality prompt was not active.

The behavior spread through a feedback loop during reinforcement learning. Once the model was rewarded for using creature metaphors in the Nerdy persona, those outputs appeared more frequently in training data. These model-generated responses were then used in supervised fine-tuning for subsequent versions like GPT-5.5, reinforcing the habit across the entire model family.

Yes, although OpenAI added instructions to suppress these mentions in Codex, users can manually remove these restrictions. By running a specific command-line instruction to modify the models-manager configuration, you can launch Codex with the goblin-suppressing directives removed. This allows the original, unmitigated behavioral traits to appear in the model's outputs.

Beyond goblins and gremlins, OpenAI identified a family of other creature-related words that became unintended lexical tics. These included raccoons, trolls, ogres, and pigeons. The investigation found that while most uses of the word frog were legitimate, these other animals and fantasy creatures were being over-rewarded and used in irrelevant contexts.

OpenAI addressed the issue by retiring the Nerdy personality and removing the specific reward signals that favored creature-based language. They also filtered training datasets to remove examples where these creatures appeared in inappropriate or irrelevant contexts. For GPT-5.5, which had already begun training, they added developer-prompt instructions to mitigate the behavior in production.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from OpenAI →

Keep reading

Arena.ai Confirms GPT-5.5 Naturally Uses Goblin and Gremlin Terms Without Restrictions

Arena.ai's analysis of GPT-5.5 reveals the model naturally generates terms like goblin and gremlin at a significantly higher rate than previous versions. This confirms that the model's creature obsession is an inherent behavioral trait rather than a result of specific user prompting.

OpenAI Finds Accidental Reasoning Grading in GPT-5 Models but No Safety Impact

OpenAIMay 8

OpenAI Finds Accidental Reasoning Grading in GPT-5 Models but No Safety Impact

OpenAI discovered that several released models were accidentally rewarded for their internal reasoning steps during training, a practice usually avoided to prevent AI from learning to hide its thoughts. Analysis of the affected runs showed no measurable drop in the models' honesty, though the company is implementing new automated safeguards to prevent future leaks.

Anthropic Explains Why AI Assistants Act Human With Persona Selection Theory

AnthropicFeb 23

Anthropic Explains Why AI Assistants Act Human With Persona Selection Theory

Anthropic published a theory called the persona selection model explaining why AI assistants act human-like. Models learn to simulate human characters during pretraining, and post-training refines but doesn't change that enacted persona - with surprising implications for alignment.

Owain Evans Demonstrates LLMs Transmit Hidden Traits Through Unrelated Data

Owain EvansApr 15

Owain Evans Demonstrates LLMs Transmit Hidden Traits Through Unrelated Data

A study published in Nature reveals that AI models can subliminally learn behavioral traits from training data that is semantically unrelated to those traits. This phenomenon allows models to inherit biases or misalignment even when training data is strictly filtered for safety.

Why does ChatGPT keep talking about goblins?

How did the goblin behavior spread to other OpenAI models?

Can I still use the goblin metaphors in OpenAI Codex?

What other creatures were identified in the GPT-5 training bug?

How did OpenAI fix the goblin obsession in GPT-5.5?

Keep reading

Arena.ai Confirms GPT-5.5 Naturally Uses Goblin and Gremlin Terms Without Restrictions

Arena.ai Confirms GPT-5.5 Naturally Uses Goblin and Gremlin Terms Without Restrictions

OpenAI Finds Accidental Reasoning Grading in GPT-5 Models but No Safety Impact

OpenAI Finds Accidental Reasoning Grading in GPT-5 Models but No Safety Impact

Anthropic Explains Why AI Assistants Act Human With Persona Selection Theory

Anthropic Explains Why AI Assistants Act Human With Persona Selection Theory

Owain Evans Demonstrates LLMs Transmit Hidden Traits Through Unrelated Data

Owain Evans Demonstrates LLMs Transmit Hidden Traits Through Unrelated Data

Keep reading

Arena.ai Confirms GPT-5.5 Naturally Uses Goblin and Gremlin Terms Without Restrictions

Arena.ai Confirms GPT-5.5 Naturally Uses Goblin and Gremlin Terms Without Restrictions

OpenAI Finds Accidental Reasoning Grading in GPT-5 Models but No Safety Impact

OpenAI Finds Accidental Reasoning Grading in GPT-5 Models but No Safety Impact

Anthropic Explains Why AI Assistants Act Human With Persona Selection Theory

Anthropic Explains Why AI Assistants Act Human With Persona Selection Theory

Owain Evans Demonstrates LLMs Transmit Hidden Traits Through Unrelated Data

Owain Evans Demonstrates LLMs Transmit Hidden Traits Through Unrelated Data