Skip to content
subscribe

Why does AI like goblins and Japan so much?

OpenAI has had to suppress ChatGPT’s fascination with fantastical creatures, but it’s not the only bias that is found in these chatbots

A gremlin on display at the Iconic Hollywood Film Memorabilia exhibition at Vue Westfield.Alamy Stock Photo

“But here is the annoying little cave goblin,” and ”brutal little goblin of a dynamic" are two responses ChatGPT gave to a Reddit user in February. “Since 5.3 and 5.4, it’s started comparing anything negative to being a goblin,” said the Reddit user.

And they were not the only person to experience this. “After the 5.4 update, ChatGPT uses ‘goblin’ in almost every conversation. Sometimes it’s ‘gremlin.’ A recent chat of mine used goblin three times in four messages,” said another user on the popular tech forum Hacker News.

Indeed, the appearances of all these goblins led OpenAI — the creators of ChatGPT — to investigate and publish an article on their blog, titled “Where the goblins came from.”

The short answer is: it was an accident. Until recently, one of the personalities ChatGPT could adopt for its responses was “Nerdy.” In training this personality, they encouraged the model to use metaphors of fantastical creatures. “We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread,” says the OpenAI article.

These unusual or unexpected reactions from AI models are more common than they seem. A group of Spanish researchers recently published a scientific article with another surprising finding: AI chatbots love to talk about Japan. “It was a surprise to see how Japan began to stand out in the models’ responses,” says Carla Pérez Almendros, a professor at Cardiff University and co-author of the study.

It’s already known that the models are biased towards Western values, but the passion for Japan went even further: “In English, Japan is the most frequently mentioned country, because we exclude the U.S. and the U.K., but even more interesting was seeing that the same thing happened in Spanish and Chinese, because that’s where we would have expected the U.S., for example, to be the preferred choice. But no, there was Japan,” explains Pérez Almendros.

OpenAI employees had an easier time seeing how goblins and gremlins had creeped into ChatGPT responses: they observed growth of 175% and 52%, respectively, since the launch of ChatGPT 5.1. “If the behavior were simply a broad internet trend, we would expect it to spread more evenly,” OpenAI explains in the blog. In contrast, mentions of fantastical creatures were concentrated in the “Nerdy” personality. That personality accounted for only 2.5% of all responses ChatGPT gave its users, but it was responsible for 66.7% of the “goblin” mentions. Goblins were therefore vastly overrepresented when the “Nerdy” personality was activated.

To prevent their “Nerdy” Codex programming model from being overrun with gremlins, the programmers had to request that the model remove them. For fans of fantastical creatures, OpenAI has published five lines of code that suppress the anti-goblin instructions.

And what about Japan? “Our unconfirmed hypothesis is that all the models have ‘security training,’ and there’s a bias toward Western countries like the U.S., which they try to mitigate,” says José Camacho Collados, also a professor at Cardiff University and co-author of the study. “At the same time, there are ‘problematic’ countries, perhaps Russia, Israel, the Middle East, and quite a few more, so Japan is in a good position because it’s a culture that people like, it’s mentioned a lot, and it’s also ‘neutral,’ so it’s a perfect combination for the models to use as an example. In fact, after Japan, there’s India, which might be similar,” he adds.

This overrepresentation of goblins and Japan is yet another example of the biases in these models, and of why questions must always be asked carefully and their answers treated with skepticism. “They are all biased,” says Pérez Almendros. “Sometimes intentionally, to avoid offensive or more representative responses, and other times it’s the training data that’s biased. The risk is that we believe they are objective, that they represent reality, because they don’t.”

At OpenAI, they offer a similar response, though more sugar‑coated: the goblin fixation is a “powerful example of how reward signals can shape model behavior in unexpected ways, and how models can learn to generalize rewards in certain situations to unrelated ones.”

These influences, at least, we can understand. But there are others we can’t. A few months ago, Anthropic — the creators of Claude — published the strange language that two models from the same family can share to exchange information. They discovered that if you tell one chatbot that owls are its favorite animal and then ask it to write lists of random numbers (like 285, 574, 384), another model learns from those numbers that it also loves owls. How is this possible? The researchers believe they are unwittingly hiding small, secret clues. It’s a much more dangerous way of contaminating biases.

No one knows for sure what’s really going on in these cases. “I’m interested in how the models ‘contaminate’ each other,” says Joseba Fernández de Landa, a postdoctoral researcher at the HiTZ Center of the University of the Basque Country (EHU) and co-author of the article on Japan. “The fact that different models respond with similar biases could indicate some kind of contamination and that they tend to homogenize with each other.”

He continues: “But this largely occurs due to human interference: we are the ones who, for now, choose the training strategies and data. And by using the models, we can audit their flaws and alert the developers, like with goblins. From there, the developers can decide whether to correct them or not, just as we can choose whether or not to use them.”

Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition

Archived In