5 structural search engines – who are your protein’s neighbors?
It is well known that proteins that share high sequence identity usually share the same fold and approximately the same structure, this is the basis for homology modeling. It also well known that some folds contains proteins with very different sequences. These cases pose a problem for sequence based structural alignments and for sequence based search protocols. For that reason, algorithms were devised to quickly and efficiently search (let’s say – the entire PDB) for proteins with a similar structure although not necessarily with similar sequence.
We present a list of 5 such algorithms, implemented into a publicly available web-server, for your convenience (some are even available for download). These software relay on sophisticated superposition algorithms and hence you can use most of them to just accurately superimpose two structures.
The SSM algorithm utilizes a procedure of matching graphs built on the protein’s secondary-structure elements, followed by an iterative three-dimensional alignment of protein backbone C-alpha atoms. It allows for: pairwise comparison and 3D alignment of protein structures, multiple comparison and 3D alignment of protein structures, examination of a protein structure for similarity with the whole PDB or SCOP archives, best Ca-alignment of compared structures and a downloadable version.
The Dali server reduces the total computational cost of searching against a huge database by pruning search space using prior knowledge about the distribution of structures in fold space. If you want to know the structural neighbours of a protein already in the Protein Data Bank (PDB), you can find them in the Dali Database. If you want to superimpose two particular structures, you can do it in the pairwise DaliLite server.
At the heart of VAST’s significance calculation is definition of the “unit” of tertiary structure similarity as pairs of secondary structure elements (SSE’s) that have similar type, relative orientation, and connectivity. In comparing two protein domains the most surprising substructure similarity is that where the sum of superposition scores across these “units” is greatest. The likelihood that this similarity would be seen by chance is then given as a simple product: the probability that one would obtain this score in drawing so many “units” at random, times the number of alternative SSE-pair combinations possible in the domain comparison, from which one has chosen the best. Protein structure neighbors in Entrez are determined by direct comparison of 3-dimensional protein structures with the VAST algorithm. Each of the more than 87,804 domains in MMDB is compared to every other one. If you already know a PDB/MMDB-Id you can try using this pre-compiled set.
CE is a method for calculating pairwise structure alignments. CE aligns two polypeptide chains using characteristics of their local geometry as defined by vectors between C alpha positions. Matches are termed aligned fragment pairs (AFPs). Heuristics are used in defining a set of optimal paths joining AFPs with gaps as needed. The path with the best RMSD is subject to dynamic programming to achieve an optimal alignment. For specific families of proteins additional characteristics are used to weight the alignment. Databases of alignments for all polypeptide chains and a representative set of proteins is available and kept current with the PDB. Search can be performed agaist all or represetatives of the PDB.
The famous DejaVu package from Gerard J. Kleywegt (available from the Uppsala Software Factory), is also implemented into a server. It allows for searching a MOTIF of secondary structure elements. A motif consists of N SSEs, each of which comprises M(i) residues and has a length of L(i) Angstrom (measured from the first residue’s Calpha to that of the last residue), and which is characterised by a matrix D(i,j) which contains the centre-to-centre distances (for example) and by another matrix C(i,j) which contains the cosines of the angles made by the direction vectors of the individual elements (the direction vector goes FROM the N-terminal Calpha TO the C-terminal one). Finding a motif in the database that is SIMILAR to that which occurs in your protein then comes down to finding suitable collections of N SSEs in the structures of other proteins which have approximately the same numbers of residues, the same lengths and comparable mutual distances and direction-vector cosines. The input to the server is a pdb file with a secondary structure motif. The secondary structure elements of the pdb file will be assigned automatically. You may input superpositioning criteria based on which the server will find similiar secondary structure motifs. Any of the hits may be viewed either separately or super-positioned.
Do you know of other servers or software that offers this functionality? Have you used one of the above? Tell us in the comments.
Random Posts
Enjoyed this Post ?
Subscribe in a reader.
Follow us on twitter.
-
http://bytesizebio.net Iddo Friedberg
-
http://emreguney.googlepages.com Emre Guney
-
http://pepx.switchlab.org Peter Vanhee















