It’s not often that a single news event changes the trajectory of humanity’s future. The last one was Tesla Battery Day, after which we have spent more than a month pondering how this advancement will shape the timeline of the battery revolution, trying to grasp its significance. That revelation brought us closer to sustainability and yet showed us that, for preventing catastrophic climate change, it is still out of reach.
This week, we again had one of those days in which a single new event changes the trajectory of humanity’s future; a scientific advance that was not expected for another decade, an advance that many expected would take a quantum computer to achieve. DeepMind solved a problem that scientists have been struggling with for over 50 years — protein folding.
What are proteins?
Proteins can be described as the lowest level of little robots that can turn A into B. You have atoms. When combined, they become amino acids. When amino acids get combined, you get proteins. When you combine proteins with carbohydrates, lipids, and nucleic acids, you start to make up the components of a cell.
Without going much deeper into biology, it’s not that simple to explain proteins, so let me give you some examples: Hormones and growth factors are proteins. Proteins can also be receptors. Insulin and oxytocin are also proteins. Hemoglobin is a protein animals have in their blood that apparently makes meat tastier. Then there is leghemoglobin, which is a plant alternative that Impossible Foods extracts from a modified yeast to make its Impossible Burger so tasty.
COVID-19 has multiple proteins we did not initially understand, including the spike that makes the virus so dangerous. That too is a protein that we did not understand very well, and it took a painstaking amount of time and money to understand it.
The problem with proteins, folding
Finding out what a protein is made of and finding out what is connected to what within the amino acid chains of the protein is not that much of a problem. In other words, we can make ourselves a 2D blueprint. That, however, is a fry cry from an actual working 3-dimensional machine. The amino acids in the protein all fold into a complex 3D shape that actually makes it all functional. Hemoglobin when folded has a perfect spot to bind an oxygen molecule, and when it reaches its destination it also perfectly changes shape to release that oxygen molecule.
CRISPR helps with editing DNA, and it does so using the CAS9 protein to cut it, but to figure that out and understand how it does so is not easy. It involves a lot of trial and error, some truly exhaustive work. The methods for figuring out the 3-dimensional shape of a protein required expensive multi-million-dollar laboratory equipment like cry-electron microscopy, nuclear magnetic resonance, and x-ray crystallography as well as years of work for just a single protein. In one relevant example, even after 10 years, a team with the right equipment was unable to map the folds of a protein.
The solution: DeepMind’s AI
DeepMind is a subsidiary company of Alphabet Inc. (formerly known as Google Inc.). It is well known for creating an AI in 2016 called AlphaGo that could beat even the world champions in the Chinese board game Go. DeepMind is also known for creating AlphaStar, which can play the popular eSports game StarCraft II. In 2020, it is on its way to becoming the world champion. Games, however, are not the only thing DeepMind works on, and today we are talking about a project of theirs called AlphaFold, an AI that can predict how proteins will fold.
Originally, AlphaFold made headlines in 2018 when its AI could significantly more accurately predict how proteins will fold. However, for a solution to the problem to be viable, we need a 90% match or more. At that point, the margin of error is approximately the width of one atom. That is only the easy way of thinking about it, because in reality what we need is 90 GDT out of 100 — GDT stands for the Global Distance Test scale. In 2018, AlphaFold nearly reached 60 GDT, which was a huge leap over the previous records, just a bit over 40 GDT. Now, in 2020, DeepMind introduced AlphaFold 2. The system has achieved a median score of 92.4 GDT and a median score of 87 GDT for the most difficult category, called free-modeling. At this point, any discrepancies could simply be errors in the laboratory testing or a valid alternative of the protein.
Now, rather than waiting years, scientists can fold proteins within mere days. Critics are quick to point out that some competitors can get a less accurate result within seconds and that in some applications speed is more important. There is truth to this, and the reason why will be made apparent in the next part, but without accurate protein folding, medical applications will not be approved.
Another criticism is that the scope of the test and the scope of the data the AI was trained on was not wide enough to truly be practical in the real world. This is something that both DeepMind and the organizers of CASP who performed the tests dispute. In the end, only after the program is put into wide use will we know for sure, but this argument is a lot more controversial than the one about speed.
So, what does this mean for the world, for the future?
No one can say for certain because this opens up new possibilities, changes a lot of timelines and plans of different scientists and corporations. There are a few things that we know for certain, once AlphaFold is peer reviewed some more and its fundamental principles are explained to the scientific community, this could very quickly become the number one tool in a molecular biologist’s toolkit.
From there, out of the 20,000 human proteins, the remaining 15,000 can be mapped and explored in a fraction of the time the first quarter took. Drugs can then be created to target or make use of those proteins. Hemoglobin, when folded, has a perfect spot to which oxygen binds, the same way we could make drugs that will perfectly target the COVID-19 spike protein.
However, and this is where the cleantech part starts to come into view, we currently have somewhere close to 20 million proteins on record that we can now also fold and analyze for their unique properties in ways we never could before — proteins that could create completely new materials, proteins that can break down various types of trash. Right now, we are already excited by enzymes that can eat plastic. Imagine a protein that combines multiple enzymes to break down paper, plastic and natural waste. Imagine a protein that is perfectly designed to break down stains on clothing and would let us do all our laundry with detergent and with cold water rather than warm water, the same goes for the dishwasher.
At this point, we are getting more into synthetic biology, where we change DNA and design new proteins ourselves. In fact, in nature, we can only find 20 or arguably 21 amino acids that are used in proteins and using just those 20 amino acids, nature through the slow process of evolution has only had time to explore a tiny fraction of the possible protein permutations. By understanding how protein folding works we can simulate all of them and there is no law in physics prohibiting us from trying to add more amino acids to the list after that.
There is still a lot of work to be done
Thanks to this breakthrough, we are now at the point where we can use the AI, choose a protein, and figure out how it looks when it has folded within a few days. With 1000 labs using this AI, it would take us 30–45 days to map out the rest of the human proteins, and 165 years to map out the rest of the proteins in the database. Having that, however, is not enough — what we also will then need is an AI that helps us understand the physical interactions in these proteins, how proteins together form more complicated systems, how these proteins interact with DNA, RNA, small molecules, and the environment you want to introduce them into. Then finally an AI capable of doing all that will need to be able to predict which of the proteins it has analyzed or synthesized has unique properties that would be interesting to bring to our attention.
In any case, it is a fantastic breakthrough. How this changes our future is not yet known, but the field of molecular biology and synthetic biology has just become even more exciting than it already was.
Don't want to miss a cleantech story? Sign up for daily news updates from CleanTechnica on email. Or follow us on Google News!
Have a tip for CleanTechnica, want to advertise, or want to suggest a guest for our CleanTech Talk podcast? Contact us here.