Within the intricate world of biology, DNA serves as the primary builder, coding for proteins that carry out critical life-sustaining functions in our bodies. However, like any finely tuned mechanism, our bodies are vulnerable to various threats, including pathogens, viruses, diseases, and cancer.
Consider the possibilities if we could accelerate the development of vaccines or drugs to combat emerging pathogens. Or imagine if we had the capability to auto-produce proteins to correct DNA errors leading to cancer. Identifying proteins that can effectively bind to targets or quicken chemical reactions is crucial for drug development, diagnostics, and many industrial applications, but the process can be lengthy and expensive.
In response to this challenge, researchers at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) developed “FrameDiff,” a computational tool designed to create new protein structures that go beyond what nature offers. This machine learning-based method generates “frames” that align with the inherent properties of protein structures, enabling the generation of novel proteins without relying on existing designs.
“In nature, protein design is a slow-burning process that takes millions of years. Our technique aims to provide an answer to tackling human-made problems that evolve much faster than nature’s pace,” explains Jason Yim, a PhD student at MIT CSAIL and a lead author of the research paper. “This new capacity to generate synthetic protein structures opens up many enhanced capabilities, such as better binders. This means engineering proteins that can attach to other molecules more efficiently and selectively, with implications for targeted drug delivery and biotechnology. It could lead to better biosensors and have significant effects in biomedicine and beyond, such as developing more efficient photosynthesis proteins, creating more effective antibodies, and engineering nanoparticles for gene therapy.”
The approach behind FrameDiff relies on understanding the complex structures of proteins, which are composed of atoms linked by chemical bonds. The key atoms that dictate the 3D shape of the protein are referred to as the “backbone.” Noticing a pattern among these atoms, the researchers built a machine learning algorithm using concepts from differential geometry and probability. This led to the concept of “frames,” modeled as rigid bodies with a position and rotation in 3D space.
In 2021, DeepMind launched AlphaFold2, a deep learning algorithm for predicting 3D protein structures from their sequences. The introduction of frames into diffusion models (like FrameDiff) was inspired by this breakthrough, and such models are now increasingly popular in image generation.
The researchers, in collaboration with the Institute for Protein Design at the University of Washington, used this principle to create and experimentally validate novel proteins. The new tool, “RFdiffusion,” has helped protein designers get closer to addressing significant challenges in biotechnology, including the design of specific protein binders for vaccine development, the engineering of symmetric proteins for gene delivery, and the precise design of enzymes.
Looking forward, the team aims to improve FrameDiff’s applicability to problems involving multiple requirements for biologics, such as drugs. They are also looking to generalize the models to all biological modalities, including DNA and small molecules. They anticipate that with more extensive training data and optimized processes, FrameDiff could generate foundational structures with design capabilities matching RFdiffusion, while maintaining its inherent simplicity.
“The possibility of rapidly generating structures extending to large lengths opens up with the discarding of a pre-trained structure prediction model [in FrameDiff],” says Sergey Ovchinnikov, a computational biologist at Harvard University. The researchers’ innovative approach marks a significant step toward overcoming current structure prediction model limitations. Although still in the early stages, it’s a promising development, moving us closer to a future where protein design plays a pivotal role in addressing humanity’s most critical challenges.
This research, advised by MIT professors Regina Barzilay and Tommi Jaakkola, was conducted by a team including Jason Yim, postdoc Brian Trippe from Columbia University, researcher Valentin De Bortoli from the French National Center for Scientific Research in Paris, postdoc Emile Mathieu from Cambridge University, and Professor Arnaud Doucet from Oxford University. The work was funded in part by the MIT Abdul Latif Jameel Clinic for Machine Learning in Health, EPSRC grants, and various other sources. The findings will be presented at the International Conference on Machine Learning in July.