ADC Reviews

After AlphaFold – another breakthrough in protein design

ProteinMPNN

Huge advances in artificial intelligence (AI) mean that researchers can design completely original molecules in seconds instead of months.

In June, South Korean regulators approved GBP510, the first human-designed COVID-19 vaccine made of an entirely new protein, based on spherical protein nanoparticles made by researchers through labor-intensive trial and error nearly a decade ago.

And in July of this year, DeepMind revealed that the latest version of AlphaFold has predicted all known protein structures. In recent months, there has been an explosion of AI tools that can rapidly create entirely new proteins. “The way things work in protein design have changed since AlphaFold, and we’re witnessing a very exciting era,” said computational biologist Noelia Ferruz of the University of Girona in Spain.

Further good news is that on September 15, the team of biochemist David Baker at the University of Washington published two papers in a row in Science, saying that the new method they created, ProteinMPNN, can be designed in seconds instead of months. protein; alternatively, a broad range of symmetric protein homo-oligomers can be generated using a method called hallucinating. Show that machine learning can be used to create protein molecules more accurately and faster than before.

the beginning of a dream
The Baker lab has been making new proteins for the past 30 years. A software called Rosetta was developed in the 1990s that split the process of making new proteins into steps. Researchers can first imagine the shape of a new protein (usually by piecing together fragments of other proteins) and then use software to deduce the amino acid sequence that corresponds to that shape.

But these “first draft” proteins made in the lab rarely fold into the desired shape, so further tweaking of the protein sequence is needed to fold only into a desired structure. Sergey Ovchinnikov, an evolutionary biologist at Harvard University who worked in Baker’s lab, said: “This step involves simulating all the ways different sequences might fold and is computationally expensive. It could take 10,000 computers running continuously for weeks to complete. “.

But by tweaking AlphaFold and other AI programs, this time-consuming step can be done in an instant. In the “hallucination” method developed by Baker’s team, the researchers fed random amino acid sequences into a structure-prediction network, and in doing so changed the structure of those amino acids, making them more and more protein-like. In a 2021 paper, Baker’s team created more than 100 small “hallucination” proteins in the lab and found that about one-fifth of the proteins resembled the predicted shape.

AlphaFold and a similar tool, RoseTTAFold, developed in Baker’s lab, were trained to predict the structures of individual proteins. But the researchers soon discovered that the network could also simulate combinations of multiple interacting proteins. On this basis, Baker’s team is convinced they can create “hallucinations” that enable proteins to self-assemble into nanoparticles of different shapes and sizes; these proteins will be composed of many copies of a single protein, similar to the protein on which the COVID-19 vaccine is based.

But when they directed the microbes to create in the lab, none of the 150 designs worked. “They don’t fold at all,” Baker said.

Meanwhile, Justas Dauparas, a machine-learning scientist in the lab, is developing a deep-learning tool to solve the so-called reverse-folding problem of determining the protein sequence that corresponds to the overall shape of a given protein. The network, called ProteinMPNN, can act as a “spell checker” for designer proteins created using AlphaFold and other tools, adjusting the sequence while maintaining the molecule’s overall shape, Ovchinnikov said.

When Baker and his team applied the second network to their “hallucinating” protein nanoparticles, they had greater experimental success. Using cryo-electron microscopy and other experimental techniques, the researchers determined the structures of 30 of these new proteins, 27 of which matched the AI-led design.

The team’s creation includes giant rings with complex symmetries unlike anything found in nature. In theory, this approach could be used to design nanoparticles that correspond to almost any symmetrical shape.

Deep Learning Revolution
Arne Elofsson, a computational biologist at Stockholm University, believes that deep learning tools like ProteinMPNN have changed the game for protein design. “Draw your protein, push a button, and you’ve got a tenth of what’s effective.” As Baker’s team did when designing nanoparticles, by combining multiple neural networks to handle different parts of the design process , can achieve a higher success rate. “Now we have complete control over the shape of the protein,” Ovchinnikov said.

Of course, the Baker lab isn’t the only lab applying AI to protein design. In a review published this month on bioRxiv, Ferruz and her colleagues counted more than 40 AI protein design tools developed in recent years using a variety of approaches.

Many tools, including ProteinMPNN, address the inverse folding problem: they specify sequences corresponding to specific structures using methods borrowed from image recognition tools. Still others are based on an architecture similar to a language neural network that generates human-like text.

Chloe Hsu, a machine learning researcher at the University of California, Berkeley, who developed a reverse-folding network with researchers at Meta7, said that with so many protein design tools available, it wasn’t clear how to best compare them.

Many teams are evaluating the ability of their AI tools to accurately determine the sequence of existing proteins from their structures. Ferruz said he would like to see a protein design competition similar to the biennial Critical Assessment of Protein Structure Prediction (CASP) experiment. That way, projects like CASP will really push the field forward.

Tested by wet lab
Baker and his colleagues believe that making a new protein in the lab is the ultimate test of their approach, as illustrated by the initial failure to create the “hallucinated” protein. “AlphaFold thinks they’re magic proteins, but they obviously don’t work in the wet lab,” said Basile Wicky, a biophysicist in Baker’s lab.

Knowledge card: wet lab is a concept relative to dry lab, which refers to a laboratory that can use various liquids to analyze and test drugs, chemicals, and other types of biological substances; dry lab is more focused on creating computer-generated models or simulation to perform applied or computational mathematical analysis in the laboratory.

But Jinbo Xu, a computational biologist at the Toyota Institute of Technology, points out that not all scientists developing AI tools for protein design have easy access to experimental setups. Finding a lab to collaborate with can take time, so Xu is setting up his own wet lab to test the AI ​​tools developed by the team.

Experiments will also be essential when designing proteins with specific tasks, Baker said. In July, his team developed two AI methods capable of embedding specific sequences or structures in a new protein. Using both approaches, they designed enzymes that catalyze specific reactions, proteins that bind to other molecules, proteins for vaccines and proteins to fight respiratory viruses.

Last year, DeepMind set up a spin-off of Isomorphic Labs in London with the intention of applying AI tools such as AlphaFold to drug discovery. Demis Hassabis, CEO of DeepMind, said he thinks protein design is clearly a promising application of deep learning techniques, especially for AlphaFold. “We’ve done a lot of work in protein design. We’re just getting started.”

“Proteins are the foundation of the whole of biology, and now all the proteins we find in every plant, animal and microbe are less than one percent of all possible proteins,” Baker said. With these new software tools, Researchers may be able to find long-term solutions to difficult problems in medicine, energy and technology.”