Twenty — twenty has been a challenge to the mankind but the DeepMind notched up a streak of wins, showcasing AIs that crack fifty year old problem of protein folding.
For the past 50 years, scientists have been struggling with the problem of “protein folding,” the mapping the three-dimensional shapes of the proteins that are responsible for diseases from cancer to Covid-19.
What is Protein and Protein folding?
· Amino acid are basic building blocks of life
· Proteins are chain of amino acid and are work horses of living organism (structure providers, movers, reaction catalysts, etc.)
· Protein folding, a sensitive process that is influenced by several external factors including electric and magnetic fields, temperature, pH, chemicals, space limitation and molecular crowding
· Function: 3D structure determines its function.
· Problem: 10^143 way to fold – Levinthal’s paradox
· Disease: Protein misfolding is believed to be the primary cause degenerative and neurodegenerative disorders
· Datasets: 200 million proteins. 170,000 protein 3D structure
· Cost: X – ray crystallography costs $120,000 and takes 1 year
Why protein structure prediction matters?
· Proteins are the building block of life and their shapes are closely linked with their functions
· The ability to predict protein structures accurately enables a better understanding of what they do and how they work
What is Alpha Fold and how it works:
AlphaFold is an artificial intelligence program developed by Google’s DeepMind which performs prediction of protein structure. The program is designed as a deep learning system that is built to predict folded protein structures to the width of an atom.
CASP is a community forum that allows researchers to share progress on the protein folding problem. The community also organizes a biennial challenge for research groups to test the accuracy of their predictions against real experimental data.
AlphaFold came top of the table at the last CASP — in 2018, the first year that London-based DeepMind participated. But, this year, the outfit’s deep-learning network was head-and-shoulders above other teams and, say scientists, performed so mind-bogglingly well that it could herald a revolution in biology.
The latest version of DeepMind’s AlphaFold, a deep-learning system that can accurately predict the structure of proteins to within the width of an atom, has cracked one of biology’s grand challenges.
In CASP, results are scored using what’s known as a global distance test (GDT), which measures on a scale from 0 to 100 how close a predicted structure is to the actual shape of a protein identified in lab experiments. The latest version of AlphaFold scored well for all proteins in the challenge. But it got a GDT score above 90 for around two thirds of them. Its GDT for the hardest proteins was 25 points higher than the next best team, says John Jumper, who heads up the AlphaFold team at DeepMind. In 2018 the lead was around six points.
A score above 90 means that any differences between the predicted structure and the actual structure could be down to experimental errors in the lab rather than a fault in the software. It could also mean that the predicted structure is a valid alternative configuration to the one identified in the lab, within the range of natural variation.
In the results from the 14th CASP assessment, latest AlphaFold system achieves a median score of 92.4 GDT overall across all targets. This means that predictions have an average error (RMSD) of approximately 1.6 Angstroms, which is comparable to the width of an atom (or 0.1 of a nanometer). Even for the very hardest protein targets, those in the most challenging free — modeling category, AlphaFold achieves a median score of 87.0 GDT.
Potential for real-world impact:
· A tool like AlphaFold might help rare disease researchers predict the shape of a protein of interest rapidly and economically.
· It would vastly accelerate efforts to understand the building blocks of cells and enable quicker and more advanced drug discovery
· This approach could help to illuminate the function of the thousands of unsolved proteins in the human genome, and make sense of disease-causing gene variations that differ between people.
The success of first foray into protein folding is indicative of how machine learning systems can integrate diverse sources of information to help scientists come up with creative solutions to complex problems at speed. There’s still much to be done in the realm of protein biology.