Mar
23
2009

5 structural search engines – who are your protein’s neighbors?

It is well known that proteins that share high sequence identity usually share the same fold and approximately the same structure, this is the basis for homology modeling. It also well known that some folds contains proteins with very different sequences. These cases pose a problem for sequence based structural alignments and for sequence based search protocols. For that reason, algorithms were devised to quickly and efficiently search (let’s say – the entire PDB) for proteins with a similar structure although not necessarily with similar sequence. 

We present a list of 5 such algorithms, implemented into a publicly available web-server, for your convenience (some are even available for download). These software relay on sophisticated superposition algorithms and hence you can use most of them to just accurately superimpose two structures. 

The SSM algorithm utilizes a procedure of matching graphs built on the protein’s secondary-structure elements, followed by an iterative three-dimensional alignment of protein backbone C-alpha atoms. It allows for: pairwise comparison and 3D alignment of protein structures, multiple comparison and 3D alignment of protein structures, examination of a protein structure for similarity with the whole PDB or SCOP archives, best Ca-alignment of compared structures and a downloadable version.

The Dali server reduces the total computational cost of searching against a huge database by pruning search space using prior knowledge about the distribution of structures in fold space. If you want to know the structural neighbours of a protein already in the Protein Data Bank (PDB), you can find them in the Dali Database. If you want to superimpose two particular structures, you can do it in the pairwise DaliLite server

At the heart of VAST’s significance calculation is definition of the “unit” of tertiary structure similarity as pairs of secondary structure elements (SSE’s) that have similar type, relative orientation, and connectivity. In comparing two protein domains the most surprising substructure similarity is that where the sum of superposition scores across these “units” is greatest. The likelihood that this similarity would be seen by chance is then given as a simple product: the probability that one would obtain this score in drawing so many “units” at random, times the number of alternative SSE-pair combinations possible in the domain comparison, from which one has chosen the best. Protein structure neighbors in Entrez are determined by direct comparison of 3-dimensional protein structures with the VAST algorithm. Each of the more than 87,804 domains in MMDB is compared to every other one. If you already know a PDB/MMDB-Id you can try using this pre-compiled set.

CE is a method for calculating pairwise structure alignments. CE aligns two polypeptide chains using characteristics of their local geometry as defined by vectors between C alpha positions. Matches are termed aligned fragment pairs (AFPs). Heuristics are used in defining a set of optimal paths joining AFPs with gaps as needed. The path with the best RMSD is subject to dynamic programming to achieve an optimal alignment. For specific families of proteins additional characteristics are used to weight the alignment. Databases of alignments for all polypeptide chains and a representative set of  proteins is available and kept current with the PDB. Search can be performed agaist all or represetatives of the PDB.

The famous DejaVu package from Gerard J. Kleywegt (available from the Uppsala Software Factory), is also implemented into a server. It allows for searching a MOTIF of secondary structure elements. A motif consists of N SSEs, each of which comprises M(i) residues and has a length of L(i) Angstrom (measured from the first residue’s Calpha to that of the last residue), and which is characterised by a matrix D(i,j) which contains the centre-to-centre distances (for example) and by another matrix C(i,j) which contains the cosines of the angles made by the direction vectors of the individual elements (the direction vector goes FROM the N-terminal Calpha TO the C-terminal one). Finding a motif in the database that is SIMILAR to that which occurs in your protein then comes down to finding suitable collections of N SSEs in the structures of other proteins which have approximately the same numbers of residues, the same lengths and comparable mutual distances and direction-vector cosines. The input to the server is a pdb file with a secondary structure motif. The secondary structure elements of the pdb file will be assigned automatically. You may input superpositioning criteria based on which the server will find similiar secondary structure motifs. Any of the hits may be viewed either separately or super-positioned.

 

Do you know of other servers or software that offers this functionality? Have you used one of the above? Tell us in the comments.

Written by Nir London in: Resources | Tags: , , , , , , ,

Random Posts

    Enjoyed this Post ?

    Subscribe by E-mail:

    Subscribe in a reader. Follow us on twitter.

    • http://bytesizebio.net Iddo Friedberg

      FATCAT http://fatcat.burnham.org/

      Protein structures are flexible and undergo structural rearrangements as part of their function. FATCAT (Flexible structure AlignmenT by Chaining Aligned fragment pairs allowing Twists) is an approach for flexible protein structure comparison. It simultaneously addresses the two major goals of flexible structure alignment; optimizing the alignment and minimizing the number of rigid-body movements (twists) around pivot points (hinges) introduced in the reference structure. In FATCAT, the structure alignment is formulated as the AFPs (aligned fragment pairs) chaining process allowing at most t twists, and the flexible structure alignment is transformed into a rigid structure alignment when t is forced to be 0. Dynamic programming is used to find the optimal chaining.

    • http://emreguney.googlepages.com Emre Guney

      Several structural comparison methods inheriting geometric hashing paradigm to recognize common substructures of given structures exist.

      http://bioinfo3d.cs.tau.ac.il/servers.html

      The algorithm works in the following way:
      Each point (corresponding to the positions of Calpha atoms) of a structure is redefined with respect to coordinate bases of all possible non-collinear point triplets in the structure to be able to define the object invariant to rigid transformation (rotation and translation). Then redefined points are inserted into a hash table where redefined point is the key of hash table entry and the basis with respect to which the point is defined and identifier of the structure pair is the value of that hash table entry. Once such a hash table is constructed with available structures a new structure can be searched for similarities with existing structures in the hash table. Points of the new structure is redefined as in the first step of the hash table construction and then the hash table is queried with the redefined points and matching basis set and structure identifier pair is cast a vote. Whichever structure (or subset of the structure) gets the most vote when votes are grouped with respect to basis sets, is concluded to be structurally similar to the given structure.

    • http://pepx.switchlab.org Peter Vanhee

      Another very useful tool for structural alignment is Mustang from Arun S. Konagurthu et al.:

      http://www.bx.psu.edu/arun/research/mustang

      If you would like to combine sequence with structural alignment, I would certainly recommend Expresso (part of the T-Coffee suite developed by the lab of Cedric Notredame). A webserver is implemented at

      http://www.tcoffee.org/

    Powered by WordPress | Aeros Theme | TheBuckmaker.com WordPress Themes
    © 2009 Rosetta Design Group LLC