17 Protein disorder prediction servers
Over the past decade it has become evident that many proteins have disordered regions, even in their native states. Moreover, entire proteins were found to be intrinsically disordered. These have several names, such as disordered proteins, unstructured proteins, intrinsically unfolded proteins, and others, as well as several functions, but perhaps the most studied function is their binding to different proteins in the cell.
There is much interest in characterising these proteins both for the major roll they play in the cell and because disordered protein regions often lead to difficulties in purification and crystallization of proteins, and become a bottleneck in high throughput structural determination. These reasons prompted a very wide bioinformatic effort to predict disordered regions in proteins or intrinsically unfolded proteins.
Critical to the development of such methods is a reliable dataset of experimentally validated disordered and ordered annotated sequences. The disprot database is such a resource, invaluable to the disorder prediction community.
Prediction methods could be roughly divided to 3 approaches: Ab-initio approaches in which predictions are based on sequence information alone usually utilizing machine learning techniques such as SVMs, Neural networks, Bayesian classifiers, etc. Template based approaches in which similar structures/(unstructures) of the sequence are examined. And lastly Meta approaches which combine the predictions of several servers.
It is hard to compare the performance of the different approaches since each was developed with a different flavour, had a different meassure for success and was trained for a specific type of unstructuredness (such as high b-factors, missing coordinates regions, complete disorder, etc..) the CASP experiment has a section devoted to assessment of such predictors, but CASP results might be even harder to decipher than to do the comparison manually.
We present here (in arbitrary order) 17 webservers intended to predict disordered regions from protein sequence. We’re aware that there are others out there and welcome any additions in the comments section.
- DisEMBL – a method based on artificial neural networks trained for predicting several definitions of disorder. It predicts and displays the probability of disordered segments within a protein sequence. DisEMBL furthermore provide a pipeline interface for bulk predictions, essential for large scale structural genomics. Linding R, Jensen LJ, Diella F, Bork P, Gibson TJ, & Russell RB (2003). Protein disorder prediction: implications for structural proteomics. Structure (London, England : 1993), 11 (11), 1453-9 PMID: 14604535
- DISOPRED2 -DISOPRED2 was trained on a set of 750 non-redundant sequences with high resolution X-ray structures. Disorder was identified with those residues that appear in the sequence records but with coordinates missing from the electron density map. A sequence profile was generated for each protein using a PSI-BLAST search against a filtered sequence database. The input vector for each residue was constructed from the profiles of a symmetric window of fifteen positions. The data were used to train linear support vector machines (SVMs). Ward, J., Sodhi, J., McGuffin, L., Buxton, B., & Jones, D. (2004). Prediction and Functional Analysis of Native Disorder in Proteins from the Three Kingdoms of Life Journal of Molecular Biology, 337 (3), 635-645 DOI: 10.1016/j.jmb.2004.02.002
- DRIPPRED – Structural disorder is predicted by looking for sequence patterns that are not typically found in the PDB. The method also incorporates PSIPRED predictions. Order/Disorder Prediction With Self Organising Maps
- Scratch (DISpro) – DISpro uses a 1D-RNN to predict the probablity that residues are disorder. The probabilities are also thresholded at probablity .5 to make a hard classification.Uses as input the sequence profile, predicted secondary structure, and predicted relative solvent accesiblity. Cheng, J., Sweredoski, M., & Baldi, P. (2005). Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data Data Mining and Knowledge Discovery, 11 (3), 213-222 DOI: 10.1007/s10618-005-0001-y
- FoldIndex – predicts if a given protein sequence is intrinsically unfolded implementing the algorithm of Uversky and co-workers, which is based on the average residue hydrophobicity and net charge of the sequence. FoldIndex© has an error rate comparable to that of more sophisticated fold prediction methods. Sliding windows permit identification of large regions within a protein that possess folding propensities different from those of the whole protein. Prilusky, J. (2005). FoldIndex(C): a simple tool to predict whether a given protein sequence is intrinsically unfolded Bioinformatics, 21 (16), 3435-3438 DOI: 10.1093/bioinformatics/bti537
- GlobProt2 – a tool to identify regions of globularity and disorder within protein sequences. It is a simple approach based on a running sum of the propensity for amino acids to be in an ordered or disordered state. Despite its simplicity, this method is able to identify such regions when compared to domain databases and sets of disordered proteins. Linding, R. (2003). GlobPlot: exploring protein sequences for globularity and disorder Nucleic Acids Research, 31 (13), 3701-3708 DOI: 10.1093/nar/gkg519
- IUPred – disorder prediction based on estimating the capacity of polypeptides to form stabilizing contacts. The underlying assumption is that globular proteins make a large number of interresidue interactions, providing the stabilizing energy to overcome the entropy loss during folding. In contrast, IUPs have special sequences that do not have the capacity to form sufficient interresidue interactions. This is done using estimation of the pairwise interaction energies of globular proteins. IUP (Intrinsically unfolded proteins) sequences estimated energies are clearly shifted towards less favorable energies compared to globular proteins. Dosztanyi, Z. (2005). IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content Bioinformatics, 21 (16), 3433-3434 DOI: 10.1093/bioinformatics/bti541
- PONDR – functions from primary sequence data alone. The predictors are feedforward neural networks that use sequence information from windows of generally 21 amino acids. Attributes, such as the fractional composition of particular amino acids or hydropathy, are calculated over this window, and these values are used as inputs for the predictor. The neural network, which has been trained on a specific set of ordered and disordered sequences, then outputs a value for the central amino acid in the window. The predictions are then smoothed over a sliding window of 9 amino acids. Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, & Dunker AK (2001). Sequence complexity of disordered protein. Proteins, 42 (1), 38-48 PMID: 11093259
- RONN – an application of ‘bio-basis function neural network’ pattern recognition algorithm to the
detection of natively disordered regions in proteins. The results of blind-testing a panel of nine disorder prediction tools against 80 protein sequences derived from the PDB shows that, based on the probability excess measure, RONN performed the best. Yang, Z. (2005). RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins Bioinformatics, 21 (16), 3369-3376 DOI: 10.1093/bioinformatics/bti534
- Spritz – predicts ordered/disordered residues using two specialised binary classifiers both implemented with probabilistic soft-margin support vector machines or C-SVM. The SVM-LD (LD: long disorder) classifier is trained on a subset of non redundant sequences known to contain only long disordered protein fragments (>=30 AA). The SVM-SD (SD: short disorder) classifier is trained instead on a subset of non redundant sequences with only short disordered fragments. Vullo A, Bortolami O, Pollastri G, & Tosatto SC (2006). Spritz: a server for the prediction of intrinsically disordered regions in protein sequences using kernel machines. Nucleic acids research, 34 (Web Server issue) PMID: 16844983
- FoldUnfold – A new parameter, namely mean packing density of residues has been introduced to detect disordered regions in a protein sequence. Regions with weak expected packing density would be responsible for the appearance of disordered regions. The method has been tested on datasets of globular proteins (559 proteins) and long disordered protein segments (129 proteins) and showed improved performance over some other widely used methods. Galzitskaya OV, Garbuzynskiy SO, & Lobanov MY (2006). FoldUnfold: web server for the prediction of disordered regions in protein chain. Bioinformatics (Oxford, England), 22 (23), 2948-9 PMID: 17021161
- Disprot – an algorithm that partitions protein disorder into flavors based on competition among increasing numbers of predictors, with prediction accuracy determining both the number of distinct predictors and the partitioning of the individual proteins. Vucetic, S., Brown, C., Dunker, A., & Obradovic, Z. (2003). Flavors of protein disorder Proteins: Structure, Function, and Genetics, 52 (4), 573-584 DOI: 10.1002/prot.10437
- PreDisorder – A target protein sequence is first aligned against several template profiles using PSI-BLAST. This creates an input profile of the sequence. This profile along with the predicted secondary structure and solvent accessibility is fed into a 1D Recursive Neural Network (1D-RNN) that makes the disorder predictions. Ranked amongst the top predictors in CASP8. Deng X, Eickholt J, & Cheng J (2009). PreDisorder: ab initio sequence-based prediction of protein disordered regions. BMC bioinformatics, 10 (1) PMID: 20025768
- metaPrDOS - uses the following eight prediction systems to predict disordered regions: PrDOS, DISOPRED2, DisEMBL, DISPROT(VSL2), DISpro, IUpred, POODLE-S, DISOclust. The performance of metaPrDOS was estimated on CASP7 prediction targets as the test set. As a result, metaPrDOS yielded an ROC score of 0.877 and an MCC value of 0.440. Authors mention that these values were superior to those obtained with the other prediction methods participated in CASP7. Ishida T, & Kinoshita K (2008). Prediction of disordered regions in proteins based on the meta approach. Bioinformatics (Oxford, England), 24 (11), 1344-8 PMID: 18426805
- Poodle – POODLE (Prediction Of Order and Disorder by machine LEarning) is a system that predicts disorder regions using amino acid sequence alone. The POODLE system consists of three predictions, short disorder regions prediction, long disorder regions prediction and unfolded protein prediction.
- PrDOS – composed of two predictors, one based on the local amino acid sequence, and one based on template proteins. The first is implemented using SVM for the position specific score matrix of the input sequence. A sliding window is used to map individual residues into a feature space. The second assumes the conservation of intrinsic disorder in protein families, and is simply implemented using PSI-BLAST. The final prediction is done as the combination of the results of the two predictors. The method achieved high performance (estimated accuracy > 90% with sensitivity of 0.56) especially for short disordered regions in CASP7. Ishida, T., & Kinoshita, K. (2007). PrDOS: prediction of disordered protein regions from amino acid sequence Nucleic Acids Research, 35 (Web Server) DOI: 10.1093/nar/gkm363
- DISOclust – The DISOclust method provides predictions of protein disorder based on the analysis of 3D structural models using ModFOLDclust. The DISOclust results are combined with those from an in-house version of DISOPRED in order to improve predictions. McGuffin LJ (2008). Intrinsic disorder prediction from the analysis of multiple protein fold recognition models. Bioinformatics (Oxford, England), 24 (16), 1798-804 PMID: 18579567
Related posts
Enjoyed this Post ?
Subscribe in a reader.
Follow us on twitter.
2 Comments »
RSS feed for comments on this post. TrackBack URL












[...] A meeting for everyone that liked this post (17 Protein disorder prediction servers): [...]
[...] of prediction servers have been created to address this problem (Nir put together a nice list here). The metaPrDOS (meta Protein DisOrder prediction System) is convenient in that it predicts [...]