AI simulates 500 million years of evolution to discover artificial fluorescent protein
A startup founded by former Meta researchers is creating an alternative evolutionary path using the most powerful computational resources ever applied in biology, according to the company
Scientists are pondering whether evolution could have unfolded differently. For instance, was it inevitable that humans would emerge, or are we the product of a series of natural accidents that might not have happened, and would have led to an entirely different world? While there is no definitive answer, artificial intelligence (AI) is now capable of conducting evolutionary experiments. One such study, published last week in the journal Science, reveals alternative pathways in the design of a type of protein that nature did not explore. This technology holds the potential to offer valuable insights for developing new therapies and other applications
In his 1989 book Wonderful Life, evolutionary biologist Stephen Jay Gould posed a thought experiment: If the tape of life’s evolution on Earth could be rewound to the beginning and played again, would the outcome be the same as we know it, or something entirely different? Gould argued for the latter. Using the analogy of video games, he suggested that evolution would have followed a very different path, and humans would not have emerged. “Replay the tape a million times … and I doubt that anything like Homo sapiens would ever evolve again,” he wrote.
Gould’s thesis has sparked widespread debate ever since, with some advocating for determinism and others supporting contingency. In his 1952 short story A Sound of Thunder, science fiction author Ray Bradbury recounted how a time traveler’s simple act of stepping on a butterfly in the age of the dinosaurs changed the course of the future. Gould made a similar point: “Alter any early event, ever so slightly and without apparent importance at the time, and evolution cascades into a radically different channel.”
Speaking the language of proteins
Scientists have been exploring this problem through experiments designed to recreate evolution in the lab or in nature, or by comparing species that have emerged under similar conditions. Today, a new avenue has opened up: AI. In New York, a group of former researchers from Meta — the parent company of social networks Facebook, Instagram, and WhatsApp — founded EvolutionaryScale, an AI startup focused on biology. The EvolutionaryScale Model 3 (ESM3) system created by the company is a generative language model — the same kind of platform that powers ChatGPT. However, while ChatGPT generates text, ESM3 generates proteins, the fundamental building blocks of life.
ESM3 feeds on sequence, structure, and function data from existing proteins to learn the biological language of these molecules and create new ones. Its creators have trained it with 771 billion data packets derived from 3.15 billion sequences, 236 million structures, and 539 million functional traits. This adds up to more than one trillion teraflops (a measure of computational performance) — the most computing power ever used in biology, according to the company.
“ESM3 is a step toward a future in biology where AI is a tool for building from first principles, just as we build structures, machines, and microchips,” says Alexander Rives, co-founder and chief scientist of EvolutionaryScale and leader of the new study. He sees biology as the most advanced technology ever created, and one that it is programmable using a common alphabet — the genetic code, which is translated into amino acids, the building blocks of proteins. “ESM3 understands all this biological data, translates it, and speaks it fluently to use as a generative tool.”
The protein that wasn’t
Rives and his collaborators applied ESM3 to the task of creating a new green fluorescent protein (GFP). GFP is a naturally occurring protein that glows green under ultraviolet light and is commonly used in research as a marker. The first GFP was discovered in a jellyfish, but other versions can also be found in corals and anemones. The scientists trained ESM3 to generate a new GFP, and the result surprised them: a fluorescent protein, which they named esmGFP, that is only 58% similar to the most closely related GFP. According to the researchers, this is equivalent to simulating 500 million years of evolution. ESM3 is now available to the scientific community as a new tool for designing proteins with therapeutic functions, environmental remediation capabilities, and other potential applications.
Thus, AI has uncovered a path that nature could have taken 500 million years ago, but for reasons unknown, did not. Rives and his collaborators explain that just a few mutations in GFP can destroy its fluorescence. However, ESM3 has found a new realm of fluorescent proteins that could have existed but did not. As they put it, “Underlying these sequences is a fundamental language of protein biology that can be understood using language models.”
Rives and his collaborators explain that only a few mutations in GFP can destroy fluorescence; and that, nevertheless, ESM3 has found a new space of fluorescent proteins that could have been, but were not: “Underlying these sequences is a fundamental language of protein biology that can be understood using language models,” says Rives.
According to Jonathan Losos, a professor at the University of Washington who studies the concept of evolutionary rewinds by observing species in the wild, “This study is a brilliant example of how there are many ways evolution could have proceeded.” He sees the findings as supporting the idea of contingency, a viewpoint famously argued by Gould.
This view is also shared by Zachary Blount, a professor at Michigan State University, who demonstrated the contingency of evolution in a renowned bacteria-growing experiment started in 1988 by his former supervisor, Richard Lenski. The experiment is still ongoing, spanning more than 80,000 generations.
“The study shows that there are viable biological possibilities that haven’t evolved (we think) on Earth, suggesting genuine paths that evolution could have taken but didn’t because the necessary history didn’t happen,” Blount says. He also notes that there is some level of determinism in nature, as evidenced by the fact that the ESM3 experiment is 42% similar to other GFPs. While Blount doesn’t believe AI will ever fully solve the rewind problem, he acknowledges its potential to help us understand what is contingent, what is not, and why. “It gives us ways to explore the realm of biological possibilities, allowing us to compare what’s biologically possible to what exists or has existed.”
Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition
Tu suscripción se está usando en otro dispositivo
¿Quieres añadir otro usuario a tu suscripción?
Si continúas leyendo en este dispositivo, no se podrá leer en el otro.
FlechaTu suscripción se está usando en otro dispositivo y solo puedes acceder a EL PAÍS desde un dispositivo a la vez.
Si quieres compartir tu cuenta, cambia tu suscripción a la modalidad Premium, así podrás añadir otro usuario. Cada uno accederá con su propia cuenta de email, lo que os permitirá personalizar vuestra experiencia en EL PAÍS.
En el caso de no saber quién está usando tu cuenta, te recomendamos cambiar tu contraseña aquí.
Si decides continuar compartiendo tu cuenta, este mensaje se mostrará en tu dispositivo y en el de la otra persona que está usando tu cuenta de forma indefinida, afectando a tu experiencia de lectura. Puedes consultar aquí los términos y condiciones de la suscripción digital.