Artificial intelligence uncovers the building blocks of life and paves the way for a new era in science
DeepMind, a company bought by Google, predicts with unprecedented precision the 3-D structure of nearly all the proteins made by the human body
Humans have been seeking the answer to a colossal challenge for half a century: identifying the basic building blocks of life, essential knowledge in the battle against terminal illnesses. Water is easy to imagine. It is simply two hydrogen atoms connected to another of oxygen: H₂O. But the protein that enriches blood, hemoglobin, carries the less memorable formula C₂₉₅₂H₄₆₆₄N₈₁₂O₈₃₂S₈Fe₄. In 1969, US biologist Cyrus Levinthal calculated that it would require more time than has passed since the formation of the universe – around 14 billion years – to untangle all the potential configurations of a single protein from its amino acid sequence, which are the links of these macromolecules. However, an artificial intelligence system created by the Google conglomerate has achieved this feat in just a few minutes. Its predictions for practically every single human protein were published last week in a giant step for biology that has removed a blindfold from human knowledge.
At the vanguard of this revolution is British neuroscientist Demis Hassabis. The 44-year-old researcher was a chess child prodigy who in 1997 was left in awe of the match between Russian grandmaster Garry Kasparov and the super computer Deep Blue. The machine beat the man, but Hassabis was left with the impression that it was a cumbersome piece of junk, and useless in a game of tic-tac-toe. When the final game was over, the University of Cambridge undergraduate came up with the idea of designing a machine that could learn any game. In 2010, Hassabis founded a company called DeepMind to lead the investigative drive toward Artificial Intelligence (AI). In 2013, his first creation taught itself to play and win at various games on the legendary Atari console. In 2014, Google bought DeepMind for $650 million (around €500 million at the exchange rate of the time).
After limbering up on video games, the scientists at DeepMind then set themselves the task of solving one of the greatest challenges in biology. Proteins – like hormones, enzymes and antibodies – are tiny machines that carry out the basic functions of life. They are made up of chains of smaller molecules, amino acids, much like a pearl necklace. These necklaces are folded in convoluted configurations that determine their function. Antibodies, the human body’s defense mechanism against invaders like the coronavirus, have a Y shape.
The recipes of all the proteins required to function are written in the DNA of every cell. The DeepMind system, baptized AlphaFold, reads this information – a sequence of amino acids – and predicts the structure of each protein. Its precision is similar to that achieved in laboratory experiments, which require considerably more time and money. It is like guessing the structure of a quiche after seeing pie crust, eggs, pepper, salt, milk and cream for the first time.
DeepMind and the European Molecular Biology Laboratory (EMBL) published more than 350,000 structures on July 22, including some 20,000 human proteins and those of 20 other organisms, such as a lab mouse and the tuberculosis bacteria. Venki Ramakrishnan of the Medical Research Council Laboratory of Molecular Biology in Cambridge and winner of the 2009 Nobel Prize in Chemistry, says that is “an astonishing advancement” with unpredictable consequences. “It has taken place long before many experts had predicted. It is going to be exciting to see the many ways in which it is going to radically change biological investigation.”
Some organizations are already working with the new database. The Drugs for Neglected Diseases initiative, a global non-profit set up with the aid of Doctors Without Borders, uses the structure of proteins to seek new treatments. Practically all diseases, from cancer to Alzheimer’s, and including Covid-19, are related to the structure of one protein or another. Other institutions, such as the University of Portsmouth in the UK, are using the program to try and design proteins capable of recycling plastic.
Hassabis, executive director of DeepMind, has announced the company’s plan is to publish 100 million structures over the next few months. The idea is to offer the predictions for the structure of practically every protein with a known sequence of amino acids free of charge. “We believe that this is the most important contribution to date that artificial intelligence has contributed to scientific knowledge,” he said following the publication of DeepMind’s research in the medical journal Nature.
The AlphaFold system was not created from nothing, as Edith Heard, director general of the EMBL, has emphasized. “AlphaFold has been trained using data from public resources developed by the scientific community, so it makes sense that its predictions are also made public.” In Heard’s view, the system represents “a genuine revolution in life sciences, like the genome was decades ago.”
To determine the real structure of a protein, hugely expensive infrastructure is required, such as the European Synchrotron Radiation Facility, a particle accelerator covering almost a square kilometer in Grenoble, France. The radiation emitted by the electrons that circle the ring, which basically consist of X-rays, allow researchers to observe the secrets of matter. Spanish biologist José Antonio Márquez explains that elucidating the shape of a protein with a synchrotron, or with the alternative method of cryogenic electron microscopy, could require “months or even years.” AlphaFold can achieve it in minutes, albeit with errors.
“We are talking about computer-generated predictions, not the experimental determination of the structure. And the precision is 58%,” says Márquez, a 52-year-old researcher who heads the Crystallography Platform at the EMBL in Grenoble. As things stand, if a scientist wants to study a protein connected to cancer, it could take months or years to analyze its structure. There are only around 180,000 structures in available databases. The information published by DeepMind has doubled that number. And in a few months millions will be available. “Today it is common to not find a protein in the databases. With AlphaFold you can obtain a prediction with a 58% reliability. You save an enormous amount of time,” says Márquez, who did not participate in the project. The system’s imprecisions are concentrated in specific regions of the proteins, which are unstructured to adapt to the environment.
Márquez points out other limitations. The DeepMind system can predict the structure of an isolated molecule, but proteins tend to interact with other proteins. AlphaFold is not yet capable of predicting the structure of these complexes but it is a system designed to learn on its own. Márquez is optimistic: “It will speed up discoveries in practically all areas of biology.”