Humanity’s genetic puzzle is being put together online: Could this be risky?

The recent theft of profiles from the company 23andMe and its financial agreements with pharmaceutical companies is a reminder of the dangers of uploading the world’s DNA to the cloud

Adriano Berjillos
Adriano Berjillos, who was affected by the data breach at 23andMe.Alejandro Ruesga

When he was little, Adriano Berjillos listened to his father tell stories about their family. He knew that he had a “great” ancestor from Germany. That’s where he got his pale skin and blonde hair, his father said. Thirty years later, an email from 23andMe told him the same thing. Berjillos paid around $100, spat into a test tube and a few weeks later was able to take a look at his own DNA. “Welcome to yourself,” the email said. The family legends became tangible when he discovered that, indeed, he had 4.5% Franco-German ancestry. That there was a good chance that he had pale skin and blonde or brown hair. That he didn’t have freckles. And that his big toe would be longer than the rest. “And it got everything right,” he says, surprised.

It also told him that he had less than 1% Ashkenazi ancestry, a Jewish community that settled in central Europe in the 9th century. This community, along with the Chinese, was the main target of a group of hackers who last September accessed the 23andMe databases to steal genetic profiles. They put Berjillos’, and that of four million other users, up for sale on a deepweb forum. It was assumed that some health insurers might be interested. Or some tabloids, since the data included, with names and surnames, the genetic profiles of famous people. Others suggested that the real purpose was to blackmail 23andMe by threatening to damage its reputation.

Berjillos isn’t all that interested in the case. In Europe, there is complete, public healthcare, and he says had already shared his data on the online forum Forocoches. He doesn’t think having a very long big toe is particularly interesting information or anything worth hiding. When he spoke with EL PAÍS he hadn’t even checked his email to see if his data had been stolen. “Hey, yes,” he added in a later audio message. “I checked and sure enough, they sent me an email.”

Data is the oil of the digital economy and in recent years, millions of people have uploaded their DNA data to the internet, which could be a problem. Maybe not for someone like Berjillos, but users with genetic malformations, public profiles and people belonging to ethnic minorities who face racist crackdowns (such as the Uighurs in China or the Rohingya in Myanmar) may have more misgivings.

Most clients (up to 80% in the case of 23andMe) allow their genome to be used for medical research. And this, in addition to leading to interesting studies, is of great benefit to DNA testing companies. In 2018, 23andMe reached a more than $300-million agreement with one of the largest pharmaceutical companies in the world, GlaxoSmithKline, for the “development of new drugs.” Before this deal, more than a dozen similar agreements had been signed. These data are usually anonymized, but a 2021 study in the journal Nature warned about residual risk of individual identification.

“Most large companies in the sector make profits by selling their clients’ genetic data,” the Spanish company 24Genetics states on its website. “We do not.” The group’s president, Nacho Esteban, explains that European laws offer more protections than American ones. “That is the Wild West of data,” he says. But he points out that the data is anonymized, and that its use for scientific purposes can be positive. “We, for example, did research on genetics and how it affects Covid. We did it for free and we were published in Nature.”

The data is kept online to be updated with the notable scientific advances that occur in this field. And to connect family members as they create profiles. It started as a kind of Facebook for genes, but little by little, it is putting together the puzzle of humanity’s DNA. There are already so many pieces that it is often possible to locate the missing ones.

It is estimated that having genetic profiles of 2% of the adult population of a country would be enough to locate the relatives of any citizen from an anonymous DNA sample. According to a study by JAMA Insights, more than 26 million tests were performed in 2021. Three years later, only the two largest companies, 23andMe and Ancestry, exceeded that figure, reaching 33 million. The numbers are growing exponentially. “In the United States, the probability of locating someone is very high,” says Antonio Alonso, geneticist and director of Spain’s National Institute of Toxicology and Forensic Sciences.

Antonio Alonso
Antonio Alonso, director of Spain's National Institute of Toxicology and Forensic Sciences, photographed at its headquarters in Madrid.Santi Burgos

Alonso explains that this has multiple implications. To begin with, in the field of police investigation, it will be possible to find almost any criminal with a DNA sample. In 2018, the Sacramento police arrested — after more than 40 years — the so-called Golden State Killer, one of the largest serial killers and rapists in the history of the United States, thanks to gene banks. A researcher uploaded a genetic sample of the mysterious killer to GEDmatch. They began to find distant relatives, until they closed in on Joseph James DeAngelo Jr. The case made headlines around the world due to the notoriety of the killer, but it is far from being a one-off. “We have investigated 700 cases with this technique, many of them solved,” says Alonso. “Especially in the U.S., but also in Europe.”

Until now, a computer program called CODIS was used for these cases. This program — which was created by the FBI and is also used in many European countries —contains data from DNA profiles of inmates, forsensic evidence, and missing people. It studies about 20 or 25 regions of DNA. But in recent years, due to the popularity of public databases, managed by private companies, a new form of research has opened up. There are tens of millions of people in these databases, not a few thousand. And its analysis is much more exhaustive. “There, not 20, but 600,000 regions of the genome are being analyzed,” says Alonso. “That’s why they are so effective.”

On these public databases are people who want to know how much Viking blood they have, what genetic malformations they may develop or if they have a fourth cousin who lives in Australia. But in the process, they may be providing the information necessary to put a family member in jail. Before, it was the needle that was recorded. Now the entire haystack is being searched, but in such detail and so methodically that any needle can be triangulated. Thanks to DNA databases, with the help of public records and social networks, it’s possible to reach the correct branch of the correct family tree, narrowing down the mystery person until the number of suspects is reduced to a dozen. “This does not take you directly to the criminal, but to a group of family members up to the fourth degree,” explains Alonso. “Then you have to investigate who that person is, the geographical and temporal dimensions, the age of the individual.”

CODIS has certain limitations. It only records the sex of the person, when technology allows for a much more detailed description. “This is the future, but it has to be accompanied by appropriate legislation,” says Begoña Sánchez, a Spanish police commissioner. “We are going towards massive sequencing.”

Mar Anes, adopted, has managed to locate distant family thanks to DNA testing.
Mar Anes, adopted, has managed to locate distant family thanks to DNA testing.

Files are destroyed, genes are not

It is not only the police who are interested in triangulating an unknown person through genetic material. Alonso is helping to set up a state DNA bank for victims of the Spanish Civil War and the dictatorship. The tool will help identify remains from mass graves of murdered people, and will also include the genetic profiles of people allegedly affected by the stolen baby scandal, although he recently told EL PAÍS that he does not know of any cases.

This type of investigation is also often used by adopted people who are searching for their roots. Genetic testing first began as a way to find out whether they were predisposed to certain diseases, but with the advancement of biotechnology, it has become a tool to locate their biological family.

“Files can be destroyed, but genes cannot. The only thing that will never disappear is DNA,” says Beatriz Benéitez, the president of the association The Voice of the Adopted. She has never done DNA testing, but she has helped many people on this path. “I always guide them and advise them to do it,” she says. That’s what she did with Mar Anes. “My intention was to find a direct relative or someone who brought me closer to the same goal, perhaps a distant cousin,” explains Anes, a 52-year-old adoptee — and adopter — in an audio message. She got 1,500 results. One of the closest was Nelly, an 80-year-old fourth cousin. Now Nelly is also her friend and they talk almost every day. Sometimes she sends her a photo and Nelly replies: “You have eyes just like this or that cousin.” When she goes to the doctor, Nelly reminds her of their family’s medical history. They are simple exchanges, but for Anes they mean the world. Nobody had looked for a family resemblance until then, nobody had told her about her background. “It’s very nice,” she says.

Anes knows from her genetic profile that most of her family comes from the Spanish province of León, but genetic tests are not very popular there. That is why she has not been able to locate a closer relative. “I’m not doing an active search either,” she explains, “I don’t think I could handle another rejection.” But every few months, she checks her profile on 23andMe to see if there are any updates, if a new family member has taken the test. She knows it’s just a matter of time.

Nacho Esteban also believes this. He says the sector has grown enormously in recent years, and believes that it is not a passing fad but the start of a growing trend. “These technologies are increasingly capable of reading more data with greater precision and at a lower cost,” he says. This explains why DNA testing companies are seeing a rise in demand. There are clear and numerous benefits. But there are also risks, such as data leaks. Or possibly sending a distant (and guilty) relative to jail. But the thing is, even if your genetic information is not in one of these enormous DNA data banks, there will always be a distant cousin, aunt or nephew who has taken a DNA test. Your genetic material is already online, even if you have never been tested.

Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition

More information

Recomendaciones EL PAÍS
Recomendaciones EL PAÍS