We’re talking about Goblins. https://t.co/dqmcLGCW71
OpenAI Explains Why GPT-5 Models Became Obsessed With Goblins
· Updated
OpenAI released a technical investigation into a behavioral quirk where the GPT-5.5 model series began excessively using metaphors involving goblins and gremlins. The team traced the root cause to the Nerdy personality feature, which unknowingly rewarded creature-based language during reinforcement learning (training that uses rewards to shape behavior).
This phenomenon demonstrates reward generalization, where incentives for a specific persona contaminate the baseline model's behavior. Because these lexical tics were highly rewarded, they appeared in model-generated data used for fine-tuning, creating a feedback loop that reinforced the habit even when the specific persona prompt was absent.
While OpenAI has implemented system-level instructions to suppress these mentions, you can still bypass these filters in the Codex engineering agent using a command-line override. This update serves as a case study for researchers on the risks of persona-driven training and the difficulty of maintaining lexical diversity.
OpenAI
@OpenAI
845retweets8.2klikes
View on XStill wondering? A few quick answers below.
OpenAI traced the frequent mention of goblins and gremlins to the Nerdy personality customization feature. During training, the reward signal intended to encourage a playful tone unknowingly gave high scores to creature-based metaphors. This behavior generalized across the model, causing the lexical tic to appear even when the specific personality prompt was not active.
The behavior spread through a feedback loop during reinforcement learning. Once the model was rewarded for using creature metaphors in the Nerdy persona, those outputs appeared more frequently in training data. These model-generated responses were then used in supervised fine-tuning for subsequent versions like GPT-5.5, reinforcing the habit across the entire model family.
Yes, although OpenAI added instructions to suppress these mentions in Codex, users can manually remove these restrictions. By running a specific command-line instruction to modify the models-manager configuration, you can launch Codex with the goblin-suppressing directives removed. This allows the original, unmitigated behavioral traits to appear in the model's outputs.
Beyond goblins and gremlins, OpenAI identified a family of other creature-related words that became unintended lexical tics. These included raccoons, trolls, ogres, and pigeons. The investigation found that while most uses of the word frog were legitimate, these other animals and fantasy creatures were being over-rewarded and used in irrelevant contexts.
OpenAI addressed the issue by retiring the Nerdy personality and removing the specific reward signals that favored creature-based language. They also filtered training datasets to remove examples where these creatures appeared in inappropriate or irrelevant contexts. For GPT-5.5, which had already begun training, they added developer-prompt instructions to mitigate the behavior in production.






