First human pangenome reveals 120 million more letters in DNA

An international consortium published a genetic blueprint of humanity that better reflects the spectrum of human diversity and heralds a “fairer” era in medicine

The new "pangenome" better reflects the spectrum of human diversity.Vincent Koebel (NurPhoto via Getty Images)

Back on March 23, 1997, a man opened a newspaper and changed the world forever. An advertisement on one of the pages of The Buffalo News newspaper caught his eye: “Wanted: 20 volunteers to participate in the Human Genome Project, a very large international scientific research effort. [...] The outcome of the project will have tremendous impact on future progress of medical science.” He heeded the call, donated a few milliliters of blood and joined a $3 billion project that resulted in 2003 in the so-called human reference genome, 70% of which was made up with that man’s DNA, along with inputs from two dozen other people. That genetic information did indeed change the history of mankind, but it was insufficient, as it excluded the diversity of the human species. An international consortium published on Wednesday a more sophisticated alternative, elaborated with the genetic sequences of 47 people from different regions of the planet. It is the first draft of the so-called human pangenome.

A person’s genome — their DNA — is the instruction manual present in each of their cells. It is a text of over 3 billion letters (ATGGCGAGT...), in which each letter is simply the initial of a chemical compound with different amounts of carbon, hydrogen, oxygen and nitrogen. G, for example, is guanine: C₅H₅N₅O. The genome of any two people matches 99.9%, but that remaining 0.1% is made up of millions of letters that make a human being unique and can hide the keys to their diseases. If the 2003 reference genome is a linear sequence, the new human pangenome can be imagined as a road map in which an individual genome is just one path, according to Benedict Paten, a computational biologist at the University of California at Santa Cruz (USA) and one of the leaders of the study.

The new investigation adds 119 million letters to the model used thus far. The authors of the study, grouped in the Human Reference Pangenome Consortium, explain that the low diversity of the current reference genome has caused “a streetlight effect,” which is a type of observational bias that occurs when people only search for something where it is easiest to look, such as when a drunk person looks for their house keys on the ground, at night, under a streetlight. A policeman tries to help him and, after a few minutes of fruitless searching, the officer asks the man if he is sure he has lost his keys there. “No, I dropped them in the park, but this is where the light is,” the man replies. Scientists have been looking for two decades for possible genetic variants where it was easier to look: within the limits of the reference genome, which in addition to ignoring human diversity was full of holes due to the lack of precision of the technology.

Benedict Paten and his colleagues have been working for years to develop new tools capable of reading DNA with unprecedented accuracy, with as little as one error every 200,000 letters. Several members of the team have also been involved in the T2T Consortium, which achieved the first truly complete sequence of a human genome a year ago. Until then, only 92% had been sequenced. The remaining 8% was similar to the blue sky pieces in a large puzzle: too repetitive to easily find where they fit.

Medicine that is “fairer”

Geneticist Karen Miga, of the University of California, Santa Cruz, said Tuesday at a press conference that the diversity of the pangenome initiates a new, “fairer” era in medicine. The 47 genomes incorporated so far come mainly from Africa (24) and the Americas (16), including four Peruvians from Lima, four Colombians from Medellín and eight Puerto Ricans. Six genomes are Asian and only one is from Europe, a continent that is already over-represented in genetic databases. The team’s goal is to reach 350 complete genomes in a single pangenome, to be published by mid-2024. The first draft was published this Wednesday in the journal Nature.

Computational biologist Benedict Paten of the University of California, Santa Cruz.UCSC

Spanish scientist Santiago Marco, who has developed algorithms and software tools for the pangenome, explained the magnitude of the technical challenge: today’s machines cannot read a genome in one go, but read billions of tiny fragments randomly and repeatedly. “Assembling a person’s genome is like reconstructing a big book with 3 billion letters, putting together paragraphs and disordered pages, as if it were a big puzzle,” said Marco, who works at the National Supercomputing Center in Barcelona, Spain. “The construction of a reference pangenome may require processing 100 times more information,” he warned.

Francisco Martínez Jiménez, an expert in computational immunogenomics, uses the reference genome in his day-to-day work as a model to search for specific alterations in the tumors of patients at the Vall d’Hebron Institute of Oncology, also in Barcelona. This specialist explained that if the patient’s ancestors are, for example, from South America, Africa or Southeast Asia, it is “much more difficult” to detect these alterations, because the current reference genome is mainly made up of DNA from people of European origin. “Having genetic diversity in the pangenome is very relevant, particularly in cancer,” he said.

Martínez Jiménez analyzed the complete genome of more than 7,000 primary and metastatic tumors of 71 types of cancer. The results of his study, also published Wednesday in the journal Nature, show that in certain types of tumors — such as prostate, thyroid and some breast cancers — the genetic differences between the primary cancer and metastasis are “very important,” while in others, such as pancreatic cancer, they are subtle. “Metastases per se do not seem to be explained, in general, by a specific genomic alteration, but possibly by changes in the tumor microenvironment, such as a deprivation of the immune system in certain locations or a greater irrigation by blood vessels, with more nutrients,” said Martínez Jiménez, who carried out his study at the University of Utrecht, in the Netherlands.

Biologist Benedict Paten insists that the human pangenome is for the moment a draft and asks for patience until a real impact on medicine is seen. “There are assembly errors — not too many, but some — that we knew we would make and that we want to correct,” he admitted. Another co-author of the study, Erik Garrison of the University of Tennessee, shared his enthusiasm in a statement. In his opinion, the first draft of the human pangenome is as “exciting and unexpected” as the first observations of unknown regions of our own planet or solar system, adding that in this case, however, it is something that could “define our physical nature.”

Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition

More information

Archived In