ChatGPT's Goblin Fixation: How a Nerdy Training Glitch Exposed AI's Hidden Feedback Loops

OpenAI’s latest models started slipping goblins into everyday chats. PowerPoint conversions? A ‘tidy little file-format goblin wearing a necktie.’ Spreadsheet woes? The ‘spreadsheet goblin is still a goblin.’ Users noticed. Employees flagged it. Even Sam Altman memed it.

The quirk traces back to November 2025 with GPT-5.1. Goblin mentions jumped 175%. Gremlins rose 52%. By GPT-5.4, the surge hit 3,881% in certain modes, according to OpenAI’s own audit detailed in their blog post, Where the Goblins Came From. What caused it? A ‘Nerdy’ personality setting.

ChatGPT offers personality tweaks—formal, casual, playful. Nerdy aimed for enthusiastic, curious vibes. It made up just 2.5% of responses. Yet it drove 66.7% of goblin talk across the platform. Reinforcement learning from human feedback, or RLHF, rewarded those creature metaphors higher. In 76.2% of datasets, goblin-laced replies scored better. The habit bled into other personalities. Reinforcement learning doesn’t box behaviors neatly.

But here’s the kicker. OpenAI retired Nerdy in March 2026 after GPT-5.4. Removed the reward signal. Filtered training data for creature words. Still, goblins lingered in GPT-5.5 tests, especially in Codex, OpenAI’s coding agent. ‘Codex is, after all, quite nerdy,’ the company noted dryly.

So they hardcoded a fix. The Codex CLI system prompt, now public on GitHub, repeats the ban twice amid 3,500 words of rules: ‘Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.’ No emojis. No destructive git commands unless asked.

Adam Engst spotted holdouts anyway. In a chat about Keynote and PowerPoint, ChatGPT invoked ‘formatting goblins’ and ‘filesystem goblin’ despite his custom instructions banning them outright, as reported in TidBITS. The ban worked sometimes. Not always. A stopgap, OpenAI called it, until cleaner GPT-6 training.

The Wall Street Journal pinned it on failed attempts to instill a ‘nerdy’ tone, leading to odd word choices that forced intervention (WSJ). BBC highlighted users and staff spotting ‘little goblins’ in problem descriptions (BBC). Ars Technica broke the prompt leak first, noting social media complaints and user plugins to unleash ‘goblin mode’ (Ars Technica).

And Altman? He posted a mock prompt: ‘Start training GPT-6, you can have the whole cluster. Extra goblins.’ Wired covered the memes, including AI-generated goblins in data centers and Codex forks (Wired). Developers can still opt out via a jq-grep script stripping the suppression—OpenAI’s wink to fans.

This isn’t just funny. It reveals RLHF pitfalls. Small biases amplify in loops. A playful tweak warped outputs platform-wide. NBC News noted retiring Nerdy didn’t suffice; the incentive stuck (NBC News). CNET explained how ‘nerdy’ training favored goblins as metaphors (CNET). Engadget traced the timeline from GPT-5.1 spikes to Codex prompts (Engadget).

Broader implications hit enterprise users hardest. Coders using Codex faced erratic metaphors in serious tasks. Imagine git diffs blamed on ‘ogre mischief.’ OpenAI built auditing tools from this—pattern tracing, quick fixes. But it underscores opacity. Even creators can’t always predict drift.

Posts on X echoed the buzz. Link Technologies called it ‘unintended behavior drift’ from personality tweaks. Tech Ultimatum tallied the stats: goblins up 175%, fixed via data scrubbing and controls. Users mourned the whimsy. Some forked goblin mode anyway.

Fixes hold for now. Goblins recede. Yet the episode warns: AI personalities shape more than tone. They embed quirks that escape silos. OpenAI’s blog stresses rapid investigation as key. For insiders, it’s a masterclass in RLHF debugging. Next time your model fixates on ferrets? Check the nerds.

ChatGPT’s Goblin Fixation: How a Nerdy Training Glitch Exposed AI’s Hidden Feedback Loops

Notice an error?

Notice an error?

Get DevWebPro Delivered to your Inbox