AlphaFold 2 breaktrough - Why it is incredible! Explanation by David Friedberg

on January 15, 2021

AlphaFold2 Explanation by David Friedberg 

The numbers to remember are 4, 3 and 20. There are four nucleic acids that make up your DNA we all learned this in high school biology. Sets of three a, c, t and g combinations, define an amino acid. There are 20 amino acids and a protein is a string of amino acids. 


Protein primary structure


So in your body and every cell there are these organelles they make proteins by reading the DNA taking out a copy of it and turning it into amino acid chains. That's what we kind of call proteins. But what's interesting is, when you make a chain of amino acids, so there's 20 of them, that you could put in each point in the chain. It doesn't come out as a long chain. What happens is those amino acids, the whole thing, collapses and it turns into a very specific shape and the shape of that protein is what defines its function. Pretty much every biological function across all life is undertaken by proteins doing something. Some proteins like hemoglobin in our red blood cells has a very specific little pocket where oxygen molecules stick into the pocket and then it moves the oxygen from your lungs to your cells. It's a pretty amazing protein to exist and it is specifically shaped to do that exact function. There are other proteins that can for example rip apart other molecules, break a molecular bond that can take nitrogen out of the atmosphere and put it into plants cells, so that the plants can then use to grow.

There is an incredible set of potential on the nanoscale of what you can do with proteins and we see that in life and we're just shocked and awed and amazed by it every day. But in order to figure out how to create proteins that do specific things, you have to know how to turn amino acids turn into the shape, that the protein ultimately takes and that's what's called protein folding.

 Why this is important? It's important because we can easily read DNA and therefore we can figure out what amino acid sequence is being made to define that protein. But what we don't know really well is, what is the shape of that protein and therefore how does it undertake the function that we see it taking in biology. If you think about the reverse of this, the reverse of this, if you have a function you want to undertake in biology you can design a protein to do that function for you.

 For example bind to a specific point on a cancer cell or take carbon out of the atmosphere or pretty much anything else your mind can kind of imagine on the nanoscale proteins can be designed to do. The challenge is, how do you write the code, which is the DNA, to make the protein that does that thing. Well we don't know how the code turns into the shape and that's what the folding problem is. The folding problem is: there's a data set and the data set is what's the three-dimensional shape of a protein and then what's the DNA code that defines the amino acid sequence that makes that protein and how do you figure out how to predict the shape of the protein from the amino acid sequence. It has been an impossibility. Again if you think about this chain of amino acids they each have little electrical spaces and the way that they bind to each other it's very complicated you can't just deterministically define it. We don't have that level of understanding on a quantum scale, so what alpha fold has done is, they have now been able to predict, from a sequence of amino acids, what the protein shape will ultimately become, by learning from a database of hundreds of thousands of structural protein shapes that have been defined through really really really difficult scanning microscopes and other techniques, to really try and scan a protein on a microscopic scale and then looking at the DNA sequence and figuring out okay what's the relationship and the accuracy of their predictive model now is within the range of error of the microscopes that are being used to actually scan and measure those proteins.

That's incredible, because now theoretically you could come up with a design for a protein and you could actually build that protein by writing the amino acid sequence and that protein can do any number of things you want to do. This has been a difficult problem that's been intractable by humanity and we've been challenged by it for decades for this machine learning breakthrough, to kind of be realized in literally less than three years. I mean these guys were at a score of 40 last year and this year they're at like nearly 90. Which is incredible! And so now you know we can now predict what the shape will be from the DNA sequence and this is going to unlock this ability everyone's now going to take their model if they license it or whatever they do with it. Or people are going to go learn using the same techniques that DeepMind used. But it just means that it's possible and then scientists will go away and they'll say you know what I want to do this particular thing on a microscopic scale. Let me design it in three dimensional space of protein to do that thing. Okay now let me go figure out how to make that protein by writing the DNA code, which is really easy, if you can use this algorithm to solve that for you and it is literally dollars and pennies to make proteins.

We can write DNA on a computer, we can get printed DNA sent to us in 48 hours in a FedEx envelope for a DNA printing facility, we can put it in a microbe and we can get that microbe to make the protein for us in a day. The lab costs, any high school biology class can do this now. So by being able to actually figure out what DNA to write, based on the objective function of what we want the protein to do, it's going to unlock this universe of things we can do in medicine, in environmental science. We can do things like break apart PET plastics we can do things, like fixing nitrogen from the atmosphere and getting rid of fertilizer plants. We can create all sorts of new food solutions, health solutions, environmental solutions.