Metaculus Help: Spread the word
If you like Metaculus, tell your friends! Share this question via Facebook, Twitter, or Reddit.
Breakthrough in protein prediction by 2031
Cross-posted on Metaculus.
Proteins are large, complex molecules essential in sustaining life. Nearly every function our body performs—contracting muscles, sensing light, or turning food into energy—can be traced back to one or more proteins and how they move and change. The recipes for those proteins—called genes—are encoded in our DNA.
What any given protein can do depends on its unique 3D structure. For example, antibody proteins that make up our immune systems are ‘Y-shaped’, and are akin to unique hooks. By latching on to viruses and bacteria, antibody proteins are able to detect and tag disease-causing microorganisms for extermination. Similarly, collagen proteins are shaped like cords, which transmit tension between cartilage, ligaments, bones, and skin.
Other types of proteins include CRISPR and Cas9, which act like scissors and cut and paste DNA; antifreeze proteins, whose 3D structure allows them to bind to ice crystals and prevent organisms from freezing; and ribosomes that act like a programmed assembly line, which help build proteins themselves.
But figuring out the 3D shape of a protein purely from its genetic sequence is a complex task that scientists have found challenging for decades. The challenge is that DNA only contains information about the sequence of a protein’s building blocks called amino acid residues, which form long chains. Predicting how those chains will fold into the intricate 3D structure of a protein is what’s known as the “protein folding problem”.
The bigger the protein, the more complicated and difficult it is to model because there are more interactions between amino acids to take into account. As noted in Levinthal’s paradox, it would take longer than the age of the universe to enumerate all the possible configurations of a typical protein before reaching the right 3D structure.
The ability to predict a protein’s shape is useful to scientists because it is fundamental to understanding its role within the body, as well as diagnosing and treating diseases believed to be caused by misfolded proteins, such as Alzheimer’s, Parkinson’s, Huntington’s and cystic fibrosis.
An understanding of protein folding will also assist in protein design, which could unlock a tremendous number of benefits. For example, advances in biodegradable enzymes—which can be enabled by protein design—could help manage pollutants like plastic and oil, helping us break down waste in ways that are more friendly to our environment. In fact, researchers have already begun engineering bacteria to secrete proteins that will make waste biodegradable, and easier to process.
Over the past five decades, scientists have been able to determine shapes of proteins in labs using experimental techniques like cryo-electron microscopy, nuclear magnetic resonance or X-ray crystallography, but each method depends on a lot of trial and error, which can take years and cost tens of thousands of dollars per structure. This is why biologists are turning to AI methods as an alternative to this long and laborious process for difficult proteins.
Critical Assessment of protein Structure Prediction, or CASP, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP provides research groups with an opportunity to objectively test their structure prediction methods and delivers an independent assessment of the state of the art in protein structure modeling to the research community and software users.
Even though the primary goal of CASP is to help advance the methods of identifying protein three-dimensional structure from its amino acid sequence, many view the experiment more as a “world championship” in this field of science. More than 100 research groups from all over the world participate in CASP on a regular basis and it is not uncommon for entire groups to suspend their other research for months while they focus on getting their servers ready for the experiment and on performing the detailed predictions.
In the most recent CASP experiment, 98 entries were accepted for 43 protein structures. The entry ranked second correctly solved three of the 43 protein structures, for a success rate of 7%.
The entry ranked first, that of Google DeepMind's algorithm AlphaFold, correctly solved 25 of the 43 protein structures, or 58.1%. Here is a non-technical press article on the feat, and here is DeepMind's blog post on it.
This question asks: Before 2031, will any entry to CASP correctly solve at least 90% of available protein structures?
This resolves positive if any entry to CASP achieves at least a score of 90 mean GDT-TS. GDT-TS is a global distance test measure of prediction accuracy ranging from 0 to 100, with 100 being perfect. If the CASP stops being run before this is achieved or before 2031, the question resolves as ambiguous.
(Edited 2020-12-01 to add ambiguous resolution if CASP stops being run.)
Metaculus help: Predicting
Predictions are the heart of Metaculus. Predicting is how you contribute to the wisdom of the crowd, and how you earn points and build up your personal Metaculus track record.
The basics of predicting are very simple: move the slider to best match the likelihood of the outcome, and click predict. You can predict as often as you want, and you're encouraged to change your mind when new information becomes available.
The displayed score is split into current points and total points. Current points show how much your prediction is worth now, whereas total points show the combined worth of all of your predictions over the lifetime of the question. The scoring details are available on the FAQ.
Note: this question resolved before its original close time. All of your predictions came after the resolution, so you did not gain (or lose) any points for it.
Note: this question resolved before its original close time. You earned points up until the question resolution, but not afterwards.
This question is not yet open for predictions.
Metaculus help: Community Stats
Use the community stats to get a better sense of the community consensus (or lack thereof) for this question. Sometimes people have wildly different ideas about the likely outcomes, and sometimes people are in close agreement. There are even times when the community seems very certain of uncertainty, like when everyone agrees that event is only 50% likely to happen.
When you make a prediction, check the community stats to see where you land. If your prediction is an outlier, might there be something you're overlooking that others have seen? Or do you have special insight that others are lacking? Either way, it might be a good idea to join the discussion in the comments.