Analysis of 200 million known proteins suggests that humans have 13 unique three-dimensional shapes

A new tool has successfully sorted the labyrinthine building blocks of life into clusters with similar structures for the first time

Proteinas Ser Humano
Human BPI protein (below), which is related to innate immunity, and another similar protein observed in bacteria.EMBL
Manuel Ansede

In 1961, American architect Irving Geis received an unusual commission: he was to draw by hand the first protein structure revealed through X-rays. It was myoglobin, which is responsible for oxygenating muscles and for giving flesh its red color. It is a sort of necklace with 153 pearls, which folds into eight tangled helices. It took Geis six months to draw it, but his effort succeeded in awakening worldwide fascination with this invisible inner world. Now, science has progressed. Last year, the Google-owned artificial intelligence company DeepMind managed to accurately predict the structure of over 200 million proteins, that is, almost all known proteins. Spanish bioinformatician Iñigo Barrio helped organize that chaos, grouping them by similar shapes. His work reveals surprising data. Humans possess 13 unique structures, which do not appear in any other living beings. Acidobacterium, a ubiquitous bacterium in soil, has nearly 1,900 unique forms.

A living being’s DNA holds the recipes for making proteins, the basic building blocks of life. Humans have some 30,000 different types, which are responsible for essential functions such as generating energy, providing support and defending the organism against viruses. They are large and complex molecules; some are simple shapes — spheres, cylinders, rings, stars, spirals and even swastikas — and others are unimaginable structures, like hemoglobin, which transports oxygen through the blood from the lungs to the rest of the body. It has thousands of carbon, hydrogen, nitrogen, oxygen, sulfur and iron atoms. Hemoglobin’s formula is C₂₉₅₂H₄₆₆₄N₈₁₂O₈₃₂S₈Fe₄.

Barrio studied this welter at the European Bioinformatics Institute in Hinxton, England. The researcher and his colleagues fine-tuned a new algorithm, called Foldseek Cluster, which can identify similar patterns among the proteins. Barrio used the tool with the AlphaFold database of 215 million proteins. The team has identified 2.3 million types of structures, over 700,000 of them unknown. Understanding a protein’s structure is essential to understanding its function and it could potentially help to design new drugs. The researchers published their findings on Wednesday in the journal Nature.

“There is almost always a relationship between the structure of a protein and its function. Almost always. In biology, you should never say always,” says Barrio, who recently joined the Wellcome Sanger Institute, also in Hinxton, very close to Cambridge. His work has succeeded in linking proteins of known function with unexplored ones. “If proteins A and B have a very similar structure, you can infer that they will have a similar function,” the researcher explains. His work is reminiscent of an archaeologist extracting mysterious prehistoric tools from the subsoil. “If you discover something in the shape of a beak, you can intuit that it is used for chopping, but there are exceptions. A fork and a comb look very similar, but they are not used for the same thing,” he observes.

The AlphaFold database includes predictions developed by DeepMind and the European Bioinformatics Institute, part of the European Molecular Biology Laboratory, an organization with over 1,800 employees at sites in Spain, France, Germany, Italy and the United Kingdom. Analysis of the 215 million proteins suggests that most of the structures appeared very early on in the evolution of living beings, in the common ancestor of animals and plants or even earlier. Only 4% of the configurations appear to be specific to a single species.

“Humans have 13 protein groups with unique structures,” Barrio explains. That figure contrasts with those of the five organisms with the most unique three-dimensional shapes: the bacteria Acidobacteria bacterium, Escherichia coli and Chloroflexi bacterium, the Asian spider Araneus ventricosus and the Pharaoh cuttlefish, with between 1,400 and 1,900 unique structures each. “We tend to see evolution as a linear process, but it’s more like a tree. We are at the end of a branch, but bacteria have continued to evolve on their own branches. There are bacteria [that are] newer than us,” the bioinformatician explains. “Moreover, developing a new structure for a new problem is not always the best way to evolve. Often, structures are recycled. There are proteins in the human species that possibly have a different function than the one they had in our ancestors,” Barrio says.

Bioinformatician Iñigo Barrio, photographed this Wednesday at the Wellcome Sanger Institute in Hinxton, England.
Bioinformatician Iñigo Barrio, photographed this Wednesday at the Wellcome Sanger Institute in Hinxton, England. Instituto Wellcome Sanger

The British company DeepMind boasts that its artificial intelligence system achieves 95% accuracy. However, nine of the 13 uniquely human structures are based on predictions with high uncertainty; according to Barrio, that might be because they are particularly disorganized conformations. The remaining four are VPS53, which is involved in transport within cells; U54, a herpes virus protein integrated into the human genome; annexins, which are involved in crossing cell membranes; and a fourth little-studied protein that may be more of a simple fragment. The 30,000 types of human proteins are grouped into some 9,000 structures.

Another of the study’s lead authors, Portuguese bioinformatician Pedro Beltrao, highlights the finding of human proteins involved in the immune system that are very similar to other bacterial proteins of unknown function. “This suggests that proteins involved in the immune system could have an ancient evolutionary origin, which we share with bacterial species. If true, this could transform what we know about immunity,” Beltrao, of the Swiss Federal Institute of Technology in Zurich, Switzerland, said in a statement.

Biologist Júlia Domingo considers the new research, in which she did not participate, “very necessary.” She explains that “we are entering a new era of massive data, and we need new tools to process, analyze and use it at high speed. Along with other colleagues at the Center for Genomic Regulation (CRG) in Barcelona, Spain, Domingo developed a method to identify the hidden buttons that change the function of proteins. Domingo says that structure is not enough to accomplish that task. “Other layers of functionality are involved, such as energies and affinity for other proteins,” she notes.

It took architect Irving Geis six months to draw myoglobin in 1961. The British chemist who provided him with the data, John Kendrew, won the 1962 Nobel Prize in Chemistry for reading that first structure with X-rays. The possibilities that artificial intelligence and new algorithms open up now are unimaginable, as Iñigo Barrio explains. “With previous methods, we would have needed 10 years to do this work. It took us five days,” he says.

Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition

More information

Archived In

Recomendaciones EL PAÍS
Recomendaciones EL PAÍS