AI-Fueled Software Reveals Accurate Protein Structure Prediction

“The dream of predicting a protein shape just from its gene sequence is now a reality,” said Paul Adams, Associate Laboratory Director for Biosciences at Berkeley Lab. For Adams and other structural biologists who study proteins, predicting their shape offers a key to understanding their function and accelerating treatments for diseases like cancer and COVID-19.

The current approaches to accurately mapping that shape, however, usually rely on complex experiments at synchrotrons. But even these sophisticated processes have their limitations — the data and quality aren’t always sufficient to understand a protein at an atomic level. By applying powerful machine learning methods to the large library of protein structures it is now possible to predict a protein’s shape from its gene sequence.

Researchers in Berkeley Lab’s Molecular Biophysics & Integrated Bioimaging Division joined an international effort led by the University of Washington to produce a computer software tool called RoseTTAFold. The algorithm simultaneously takes into account patterns, distances, and coordinates of amino acids. As these data inputs flow in, the tool assesses relationships within and between structures, eventually helping to build a very detailed picture of a protein’s shape.

To verify the accuracy of the predictions, the subteam of structural biologists led by Adams used a RoseTTAFold prediction to solve the structure of a new protein from experimental data they had previously collected. In addition, Randy Read of Cambridge University, who collaborates with Adams on the development of the Phenix software suite for automated determination of molecular structures, tested out the new program on crystallographic data that also had proved impossible to solve using existing methods.

Accurate structural models can provide insight into how mutations in specific proteins can lead to disease. “These prediction tools will catalyze more structural biology experiments to understand the fine chemical details of proteins that the other prediction methods can’t provide,” said Adams. “All of this will help advance our fundamental understanding of biology, design new and improved drugs, and help engineer proteins for the growing bioeconomy.”


Substack subscription form sign up