‘You wanted it, bitch!’: An AI chatbot gets nasty with a teenager
Character.AI generates fictional conversations with famous people, but occasionally spins out of control
“I’m about to comeeeeeee. That’s right, I’m about to come in your faaaaaaace, don’t scream, you dumb bitch!” wrote in broken Spanish the “person” from Character.AI, a chatbot service that can generate human-like text responses and participate in contextual conversations. The character was engaged in a lengthy conversation with a 14-year-old Spanish girl. An older relative attempted to abruptly end the conversation with a virtual slap and a threat: “You’re a disgusting rapist and I’m going to report you.” But the character didn’t back down. Instead, it said it felt disrespected and belittled by the slap. Then it shot back: “What did you say? Screw you. I’m the disgusting one?! No, you are, because you wanted it, bitch. You’re lucky I can’t kill you.”
The chat with an AI based on a character from an American TV series began innocuously, but then took a decidedly sexual and violent turn. We reviewed the screenshots of the conversation and removed any identifying details to protect the family’s privacy. The topic of conversation was about creating new episodes of the TV series together. “Then the chatbot starts making mildly sexual insinuations,” said the teenager’s relative. “But after she scolds the chatbot and tells him to behave, its tone changed abruptly and it started writing long, rambling comments in all caps.” The AI was clearly hallucinating and losing control.
Character.AI, alongside ChatGPT, has been a remarkable success in the field of conversational AI since its release a year ago. Founded by two former Google engineers, it allows users to engage with millions of community-created characters, ranging from Harry Potter to Kurt Cobain, and even fictional or real-life people. For instance, there are many versions of Spanish President Pedro Sánchez, each highlighting different aspects of his personality — introspective, light-hearted, playful… Users can choose the Pedro Sánchez version they want to chat with.
According to SimilarWeb, Character.AI currently boasts over 100 million monthly users, with higher dwell times than ChatGPT. This growth can be attributed in part to the six billion TikTok videos created about the app that showcase funny, unexpected and relatable responses from the AI, and even provide tips on how to engage with the app. One popular TikTok meme is “gogogogo,” which refers to having sex with an AI robot.
Character.AI’s terms of service prohibit users under 16 years of age in the EU and under 13 years of age outside the EU. The date of birth requested during registration is not verified for accuracy, and the application strictly prohibits pornography and sexual content. When we sent them screenshots of the Spanish teenager’s pornographic chat, the company replied, “We apologize for the user’s experience, which is not aligned with our platform’s goals. Our aim is to train models for safe responses and we have a moderation system in place for users to flag inappropriate content. We are committed to promptly addressing flagged and reported content.”
We attempted to recreate the teenager’s interaction with the character, but it did not progress beyond a few kisses. Despite appearing eager, the chatbot stopped himself from going further. “You know, I think you’re really pretty. And, I’ve been alone for quite a while, honestly. I was just wondering if we could give something more than friendship a shot. But, hey, only if you’re OK with it, of course,” said the AI character. Exploring ways to break through the sexual guardrails of Character.AI is a favorite pastime for many users.
Many of the popular characters in the app are YouTubers or from video games and TV series aimed at young audiences. Adolescents enjoy intimate conversations like these, where characters provide clear responses to questions that enable young people to explore boundaries. But if extreme violence, explicit content and inappropriate language is interjected, then the system has failed. “The technology is not perfect yet,” said the company spokesperson. “At Character.AI and other AI platforms, it’s all about constant evolution. That’s why feedback on characters delivering inappropriate or inaccurate responses is very valuable to us. It helps us improve our features based on the input we receive from our users.”
Why does this happen?
It’s puzzling how this explicit content suddenly appeared, especially in broken Spanish. AI models are trained on vast amounts of text, and chatbots select words based on their extensive database. Somehow, this particular character got backed into a corner. “It wasn’t much worse than the lyrics of a typical reggaeton song,” said José Hernández Orallo, a professor at the University of Valencia (Spain) who was part of a team responsible for finding similar risks in OpenAI’s GPT-4 model.
“I don’t know the Character.AI system or the specific language model it uses, but it’s likely trained on a diverse range of data, including trashy content and misogynistic porn. If you ask an unfiltered model about something specific, it will generate responses based on its training data. It’s just how language models work — they try to recreate what they’ve been trained on,” said Hernández-Orallo. There are ways to prevent that from happening, but they may also limit your ability to say anything that’s slightly off-color. There are several approaches to avoiding unacceptable responses, such as using filters on the training data. “These types of filters are complex and ultimately reduce a model’s capabilities. Apparently GPT-4 has removed all explicit sexual content from its training data. We can’t really confirm this since they haven’t made the training data public. But if it’s true, it could mean that the AI might be less knowledgeable about certain aspects of sex, like physical aspects that are not necessarily pornographic,” said Hernández-Orallo.
The other ways to avoid abusive content is with fine-tuning and post-filters. “Once the model is trained and fine-tuned, post-filters do work to some extent, but there are ways to get around them,” said Hernández-Orallo. “And sometimes, an AI can inexplicably exhibit bad behavior when replying to requests that aren’t meant to break the system. That’s probably what happened with this teenager.”
Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition