Prediction of protein disorder from amino acid sequence

Structural disorder is vital for proteins’ function in diverse biological processes. It is therefore highly desirable to be able to predict the degree of order and disorder from amino acid sequence. AU researchers have developed a prediction tool by using machine learning together with experimental NMR data for hundreds of proteins, which is envisaged to be useful for structural studies, as well as understanding the biological role and regulation of proteins with disordered regions.

8 September 2020 by Lise Refstrup Linnebjerg Pedersen

In the last century, Anfinsen showed beyond a doubt that a protein can find its way back to its 'native' three-dimensional structure after it has been placed under 'denaturing conditions' where the protein structure is unfolded. The profound conclusion of his experiments was that apparently the information that governs the search back to the native state is hidden in the amino acid sequence. Thermodynamic considerations then set forth a view where the folding process is like rolling energetically downhill to the lowest point - to the unique native structure. These findings have often been intertwined with the central dogma of molecular biology. Thus, a gene codes for an amino acid sequence, and the sequence codes for a specific structure.

Enter intrinsically disordered proteins.

The next breakthrough came with the advent of cheap and fast genome sequencing in the wake of the human genome project; once thousands of genomes of various organisms were sequenced, scientists made a staggering discovery - there were lots and lots of genes that coded for proteins with low-complexity. In other words, these proteins did not contain the right amino acids to fold up and experiments confirmed that they remained 'intrinsically disordered'. Also, the human genome turned out to have more than a third of its genes coding for protein disorder!

How to detect protein disorder?

Since disordered proteins are very flexible, they are not amenable to crystallization and therefore no information can be obtained from X-ray diffraction on protein crystals - the approach that has been so pivotal for folded proteins. Instead, these proteins must be studied in solution, and for this purpose NMR (Nuclear Magnetic Resonance) spectroscopy is the most suited tool. In this method, a quantum physical property called 'spin' is measured in a strong magnetic field for each atom in the molecule. The exact precession frequencies of the spins are a function of their environment, and it is exactly this frequency that allows researchers to quantitatively measure to which extent each amino acid is ordered or disordered in the protein.

In their new paper, published on 8 Sept 2020, Dr. Rupashree Dass together with Associate Professor Frans Mulder and Assistant Professor Jakob Toudahl Nielsen have used machine learning together with experimental NMR data for hundreds of proteins to build a new bioinformatics tool that they have called ODiNPred. This bioinformatics program can help other researchers making the best possible predictions of which regions of their proteins are rigid and which are likely to be flexible. This information is useful for structural studies, as well as understanding the biological role and regulation of intrinsically disordered proteins.

For further information, please contact

Associate Professor Frans A. A. Mulder
Interdisciplinary Nanoscience Center and Department of Chemistry
Aarhus University
Email: fmulder@chem.au.dk