Decades of turbo-charged biotech could follow DeepMind’s game-changing AlphaFold algorithm
“In a serious sense, the single protein, single domain problem is largely solved. The question is… What now?”
— John Moult, PhD (CASP founder), December 4, 2020
DeepMind’s extraordinary AlphaFold has generated a lot of buzz and a lot of questions. How important is this result? What will happen next? How will the scientific community translate these results into impactful technologies? This essay seeks to answer these questions.
What are the implications of DeepMind’s exemplary performance in CASP14?
DeepMind’s AlphaFold is a major advancement for the entire scientific community. AlphaFold’s demonstration of deep learning to solve a difficult biological problem will “turbo-charge” (to borrow a term from DeepMind CEO Demis Hassabis) science by inspiring application of similar methods to advance fields ranging from medicine to agriculture.
DeepMind’s vast resources allowed them to test dozens of deep learning models before arriving at AlphaFold 2’s final form. They ultimately chose an end-to-end deep learning process, which enables two key features: consideration of multiple biological properties in tandem, and a confidence metric that the process assigns to its own result. The confidence metric not only helped researchers determine the best network type for the problem, but also helps the process steer its own predictions. An important finding is that a soft and unordered attention network is well suited to represent and predict protein structure. In summary, key algorithm features contributing to AlphaFold’s success will now be known to the wider scientific community.
Monomeric soluble protein structure prediction is solved
Monomeric soluble protein structure prediction is “largely solved.” This was directly stated by CASP founder John Moult, PhD during the CASP14 conference. There are two major caveats here: First, monomeric soluble protein structure prediction is a relatively simple structural biology problem, and it is certainly not the only structural biology problem. Second, AlphaFold’s performance must be replicated by others to have a truly wide-reaching impact.
There remain many unsolved, complicated problems. Structure prediction of oligomeric proteins, membrane-bound proteins, and protein binding are not solved. Protein design is not solved. Protein dynamics is not solved. The most widely used protein modeling software in the world, Rosetta, has a proven track record addressing these complex problems — even without deep learning.
Deep learning for protein modeling will become widely available via Rosetta
Rosetta was first developed approximately two decades ago with only one capability. Right now AlphaFold has only one capability. In fact, these two first capabilities are identical: monomeric soluble protein structure prediction.
Since its inception, Rosetta has expanded to address a multitude of complex structural biology problems. The modularity of the Rosetta software means that deep learning processes can be implemented within an existing framework capable of modeling complex biological problems with real impact (e.g. design a new protein with multiple conformations and binding interactions for use in cancer treatment (Lajoie et al 2020)).
Most probably, the game-changing deep learning methods presented by DeepMind will reach broad application via integration into Rosetta (which is owned by the University of Washington and licensed to thousands of academic and corporate users). Many other research groups will also be inspired to incorporate deep learning into their scientific approaches.
Furthermore, Hassabis stated that DeepMind is in the early stages of forming collaborations, so we may look forward to DeepMind’s direct involvement in biotech applications.
Rosetta already uses deep learning to design proteins
AlphaFold’s 2018 implementation was so impressive (AlQuraishi 2018) that it inspired academic scientists to develop deep learning algorithms in Rosetta (Yang et al 2020). Deep learning Rosetta has since achieved the ground-breaking feat of “hallucinating” proteins; the new method imagined 27 brand new proteins that folded correctly when created in the lab (Anishchenko et al 2020). Deep learning Rosetta was also applied to the complex biological problem of designing new protein-protein interfaces (Tischer et al 2020).
The importance of these results cannot be understated. The relationship between deep learning Rosetta and AlphaFold makes the Rosetta results all the more outstanding as a demonstration of how quickly scientists can adapt and apply deep learning to a variety of complex and exciting biological design problems.
Deep learning is rising to relevance in many areas of science beyond what is described here, with potential for high impact on many aspects of human health and technological advancement.
This is a beginning
Moult’s assertion that monomeric soluble protein structure prediction is “solved” is remarkable not because this is the end of anything, but rather because we bear witness to exponential acceleration of our biotech century.