David Baker, biochemist: ‘Now we can build completely new proteins to do exactly what we want’

The researcher is one of the creators of deep learning artificial intelligence systems that generate proteins that have never existed in nature, which could usher in a new era for science and medicine

American biochemist David Baker.FBBVA

American biochemist David Baker is leading a technological revolution that has the potential to change science and medicine forever. In order to understand its full potential, one has to go to the very depths of a living being: its DNA. This molecule stores all the necessary instructions to make proteins through different combinations of the letters ACGT. Proteins, in turn, are responsible for almost any biological process you can think of, from a tree’s growth to a firefly’s glow, as well as a person’s ability to think, breathe, digest and everything in between.

Understanding how proteins take shape has been one of the most complicated problems in biology for more than half a century. The transcription and translation of DNA within cells produces a linear sequence of amino acids arranged in single file; but that string twists on itself in mere fractions of a second to form perfect three-dimensional structures capable of embracing, cutting, pasting, absorbing, transmitting and producing. Nothing in the two-dimensional DNA or amino acid sequences hints at the shape that the final molecule will take, and it is precisely its shape that will determine its function. Calculating all the possible shapes of a single protein with conventional methods could take more than 13.7 billion years – the age of the universe. And there are hundreds of millions of different proteins in nature.

Last summer, AlphaFold, an artificial intelligence system developed by DeepMind – a Google-owned company – predicted the shape of almost all known proteins: more than 200 million. The historic achievement was made possible by deep learning systems. These sets of algorithms mimic the workings of neurons in the human brain and, even though they are still far from matching our organ’s capacity, they are very efficient at finding patterns in huge databases. Thanks to these systems, predicting the shape of a protein is now done in minutes instead of years.

David Baker’s laboratory at the University of Washington has gone one step further, developing several open artificial intelligence systems that create proteins that have never existed in nature. The RoseTTA Fold system and its successors make it possible to design, with unprecedented ease, new proteins with amazing functions, such as blocking all variants of Covid-19 or fighting diseases with no known cause, such as Crohn’s or idiopathic pulmonary fibrosis. His team is perfecting a system of tools to “say a protein,” that is, to describe its function out loud and have the computer provide its complete sequence. It also seeks to auto-complete a protein of which only part was provided, as if it were a Google search.

Baker has just received the BBVA Foundation’s Frontiers of Knowledge Award in Biomedicine, along with his DeepMind colleagues Demis Hassabis and John Jumper. In this interview, conducted by videoconference, he talks about the enormous potential of this technology. One of his most achievable goals is to create a nasal spray that blocks the entry of respiratory viruses like the flu virus, the respiratory syncytial virus, the coronavirus and other winter respiratory pathogens, thanks to artificially-engineered proteins.

Question. You say that this technology will change the world more than the Stone Age or the Industrial Revolution. Why?

Answer. Until very recently, all the proteins we knew came from an evolutionary process. Kind of like some kind of ancient elven language that was passed down to us. And the way that people have approached engineering and biology is to take these ancient proteins and modify them a little bit. That’s the analogy with the Stone Age; that’s how Stone Age technology worked. You modified sticks and stones a little bit. And so now that we can build completely new proteins from first principles to do exactly what we want to do, that’s much more like human technology outside of biology’s possibilities.

Q. What applications will it have?

A. We’re first going to see an effect in medicine, with new designs of medicines, more effective and cheaper. Maybe five, six, seven years ago, we developed a protein shell shaped like an icosahedron [a polyhedron with 20 faces]. They’re completely designed. And then my colleague Neil King’s group put the coronavirus receptor-binding domain on the surface, and it turned out to elicit a really, really strong response. So Neil’s coronavirus vaccine [has been approved and] it’s in use in humans in Korea. We’re also working on photonics to improve solar panels and new cancer therapeutics, or making new materials. The possibilities are almost endless.

Now that we can build completely new proteins from first principles to do exactly what we want to do, that’s much more like human technology outside of biology

Q. What are the limits of this technology?

A. One way to get an idea of the limits is just to look at evolution and all the living things and all the different things that living things do, which are all pretty much done by proteins. And think that this is just what happened by random chance. You know, evolution doesn’t really have a plan. It’s just random mutation and selection. Suppose now humans were actually going out and designing proteins to solve problems. The possibilities are way beyond anything that any of us can imagine now.

Q. Your new system can design proteins on demand by talking to the computer. Can you, for example, ask for a protein that cures Alzheimer’s?

A. We can now give a simple description of the problem and the deep learning methods will generate proteins that have those properties. But it’s still not perfect. Once you’ve designed a protein on the computer, then you have to make it in the lab and see whether it actually does what you want it to do. And then once you have that designed amino acid sequence, you can encode it in a piece of in a synthetic DNA, like a synthetic gene. And then you put that synthetic gene into bacteria. And the bacteria basically become bags of your protein. Now, could we just say “design a cure for Alzheimer’s?” The hard part is that we don’t totally understand Alzheimer’s. We can design all sorts of different complicated proteins with different functions. I think now really the challenge is what problems do we now set out to solve? And like you say, we want to cure Alzheimer’s, but how do we do that? We need to have a kind of a molecular hypothesis. You have to understand the origins of a disease.

Q. How reliable is this technique?

A. It really depends on what the problem is. For some problems, the success rate might be 75%. We’re still learning because this is all very new. But for the flu virus, we were able to make designs and test them in under two weeks, which is really exciting because one of the things where we think this could be very powerful is for pandemic preparedness. But then for other problems, like if you say “make an enzyme to degrade plastic,” we still need to do a lot more work than that. So it depends on the difficulty of the problem.

Q. One of your goals is to develop a nasal spray that could be used to protect ourselves from several respiratory viruses at once. When do you think it would be possible to have it?

A. There are economic questions. There’s not so much money to be made in nasal sprays. So the real question is, will there be an organization, like a pharmaceutical company or a governmental organization, that would be interested, that would be willing to pay for the cost of distributing that to the world? This is generally a problem with infectious diseases. But I think we’re going to learn within the next year about the efficacy of this nasal spray in humans for Covid-19. And if it is efficacious, it seems very, very reasonable that the same concept would apply to all respiratory viruses.

If you’re a bad actor, you don’t need protein design to do bad things because the sequences of these viruses are known

Q. Do you see any dangers in this technology?

A. Nature has already perfected ways of causing death and destruction on a huge scale. The 1918 Spanish flu spread around the world and was just incredibly lethal. So if you’re a bad actor, you don’t need protein design to do really bad things because, you know, the sequences of these things, also Ebola virus, are known. So like with any new powerful technology we’ll have to think about how to make sure it isn’t used for bad purposes, but I think there’s already so many known viruses out there which are human killing machines that for now I think there’s not too much risk of humans coming up with new ones.

Q. Do you see any cause for concern in the fact that DeepMind is owned by Google and that they are so secretive about the work they do?

A. There’s a contrast between my lab, because we have visitors coming in all the time and people going out and we share information everywhere, and a company like DeepMind, which is a totally closed system. When DeepMind announced the breakthrough AlphaFold 2 a little over two years ago, there was a lot of gloom and doom in the academic community. Everyone said that big tech companies were going to be the only ones who could really play in this space and develop deep learning methods, and academic science was kind of over. I think that’s clearly not the case. If you just look at what’s been done in the two years since AlphaFold, almost all the advances have been in the academic community. DeepMind really hasn’t done much in this area. Meanwhile, my group and other groups have had huge advances in protein design and other things. What ended up happening is that DeepMind made AlphaGo freely available, which is really wonderful. There may have been people at Google who wanted to try and monetize it, commercialize it, but maybe we helped them make the argument that they should just give it out and they did give it out. But it is still true what you said, that DeepMind is so secretive. That asymmetry comes at a cost, too, because [for example] yesterday I had a visitor from Cambridge, England, whom we’re collaborating with on making new Alzheimer’s drugs. And I also have a visitor now, also from England, working on new ways of harvesting light energy, photonics to improve solar panels. And yesterday, I had a long call with one of the leading cancer researchers at the NIH [National Institutes of Health] on new cancer therapeutics. So by being an open system, you can have so many more things and ideas come in. This free exchange of information, I think, really benefits scientific progress.

Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition

More information

Archived In