The first complete sequence of a human genome marks a new era for science
An international consortium has shed light on an unknown world of repeated sections in people’s DNA, including essential information about genetic diseases such as cancer
A human being’s instruction manual is written into each one of its cells, in a DNA molecule that, unfolded, measures some two meters long and contains the instructions to transmit a thought, perform a dance, or sing a song. The first complete reading of a person’s DNA – their genome – has revealed an unknown world of redundant sequences. It goes against intuition, but these repeated sections of the DNA of a human being help them to be unique.
An international scientific team, called the Telomere-to-Telomere (T2T) Consortium, has this week published the first truly complete sequence of a human genome. Until now, only 92% had been read. The authors have compared their task to a giant puzzle of a city, in which there were still 8% of the pieces left to place, those of the blue sky – too repetitive to find their place. Technological advances have now allowed for those redundant sections to be put in order.
Bioinformatic specialist Daniela Soto, a member of the consortium, is jubilant. “We have opened a black box and now we can see 8% of the genome that we couldn’t see before,” says Soto, a 33-year-old researcher who is pursuing her doctorate at the University of California in Davis. “The impact on medicine is yet to be seen.” Her colleague Karen Miga, the co-director of the consortium, believes that the new genome will shed light on genetic diseases, human diversity, and the evolution of our species, according to a press release.
The complete genome of a human being is a text containing 3.055 billion letters (ATTGCTGAA…), in which each letter is simply the initial of a chemical compound with different amounts of carbon, hydrogen, nitrogen and oxygen. The C, for example, is cytosine, (C₄H₅N₃O). New technologies, such as the sequencers from the British company Oxford Nanopore, allow for millions of letters to be read in a row, while the American platform PacBio allows a high-definition reading of up to 20,000 letters. It’s like putting on glasses to be able to see close up.
The first complete genome of a person adds 200 million letters to the total, and corrects thousands of errors in the current reference sequence, which was published in 2001 and has been updated since then. The DNA of a cell is shared across 46 packets, denominated chromosomes, which are also divided into sections, called genes. Some of these genes are instructions to make proteins, such as some hormones or the antibodies that protect us from a virus. The complete genome, published this Thursday in Science magazine, includes 99 genes apparently linked to proteins, and that were until now unknown, and another 2,000 apparent genes that will have to be studied in detail. The T2T Consortium already published a draft of its results nearly a year ago.
The surprising finding is that a person can have two copies of a gene but another person can have 25 copies of the same gene. T2T has has now joined forces with the Human Pangenome Reference Consortium (HPRC), another international project that seeks to obtain the complete genomes of 350 people of diverse racial groups in order to understand the role of these repetitions and to detect variants associated with illnesses such as cancer.
The genome used as a reference until now, called GRCh38, is a sequence full of gaps, and was created from scraps of DNA from different people. The new complete genome, called T2T-CHM13 and mostly from Europeans, was obtained from a hydatidiform mole, a tumor derived from a human embryo that rejected the DNA of the mother and duplicated that of the father. This absence of a genetic mix facilitated a detailed reading of the genome.
To diagnose illnesses, doctors and scientists read fragments of their patients’ genomes and use the reference genome to make a comparison, as is done with puzzles that have a blue sky on the box. Soto’s team has already analyzed the genomes of more than 3,000 people using the new model, and the results suggest that tens of thousands of errors are avoided, for example, in more than 200 genes involved in diseases. The National Institute of Standards and Technology in the US has stated that this complete genome could greatly improve research into genetic conditions, improving diagnoses in the medium term.