Jan
04
2010

NAR 2010 DB issue – What’s in it for us?

NAR just published the new special “Database” issue for 2010. We collected 23 papers that might be of interest to structural biologists, modelers and other protein people.

3D-footprint: a database for the structural analysis of protein–DNA complexes

Bruno Contreras-Moreira1,2,3,*

1Estación Experimental de Aula Dei, Consejo Superior de Investigaciones Científicas, 2Fundación ARAID, Paseo María Agustín 36, Zaragoza, Spain and 3Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico

*To whom correspondence should be addressed. Tel: +34 976716089; Email: bcontreras@eead.csic.es

Received August 14, 2009. Revised September 2, 2009. Accepted September 3, 2009.

3D-footprint is a living database, updated and curated on a weekly basis, which provides estimates of binding specificity for all protein–DNA complexes available at the Protein Data Bank. The web interface allows the user to: (i) browse DNA-binding proteins by keyword; (ii) find proteins that recognize a similar DNA motif and (iii) BLAST similar DNA-binding proteins, highlighting interface residues in the resulting alignments. Each complex in the database is dissected to draw interface graphs and footprint logos, and two complementary algorithms are employed to characterize binding specificity. Moreover, oligonucleotide sequences extracted from literature abstracts are reported in order to show the range of variant sites bound by each protein and other related proteins. Benchmark experiments, including comparisons with expert-curated databases RegulonDB and TRANSFAC, support the quality of structure-based estimates of specificity. The relevant content of the database is available for download as flat files and it is also possible to use the 3D-footprint pipeline to analyze protein coordinates input by the user. 3D-footprint is available at http://floresta.eead.csic.es/3dfootprint with demo buttons and a comprehensive tutorial that illustrates the main uses of this resource.

[FREE Full Text of Contreras-Moreira] [Reprint (PDF) Version of Contreras-Moreira]


PROSITE, a protein domain database for functional characterization and annotation

Christian J. A. Sigrist1,*, Lorenzo Cerutti1, Edouard de Castro1, Petra S. Langendijk-Genevaux1, Virginie Bulliard1, Amos Bairoch1,2 and Nicolas Hulo1

1Swiss Institute of Bioinformatics (SIB), Centre Médical Universitaire and 2Structural Biology and Bioinformatics Department, University of Geneva, 1 rue Michel Servet, CH-1211 Geneva 4, Switzerland

*To whom correspondence should be addressed. Tel: +41 22 379 58 68; Fax: +41 22 379 58 58; Email: christian.sigrist@isb-sib.ch

Received September 3, 2009. Revised October 2, 2009. Accepted October 2, 2009.

PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE is largely used for the annotation of domain features of UniProtKB/Swiss-Prot entries. Among the 983 (DNA-binding) domains, repeats and zinc fingers present in Swiss-Prot (release 57.8 of 22 September 2009), 696 (~70%) are annotated with PROSITE descriptors using information from ProRule. In order to allow better functional characterization of domains, PROSITE developments focus on subfamily specific profiles and a new profile building method giving more weight to functionally important residues. Here, we describe AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanProsite to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources. The latest version of PROSITE (release 20.54, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at: http://www.expasy.org/prosite/.

[FREE Full Text of Sigrist et al.] [Reprint (PDF) Version of Sigrist et al.]


ELM: the status of the 2010 eukaryotic linear motif resource

Cathryn M. Gould1, Francesca Diella1, Allegra Via2, Pål Puntervoll3, Christine Gemünd1, Sophie Chabanis-Davidson1, Sushama Michael1, Ahmed Sayadi2, Jan Christian Bryne3,4, Claudia Chica1, Markus Seiler1, Norman E. Davey1, Niall Haslam1, Robert J. Weatheritt1, Aidan Budd1, Tim Hughes5, Jakub Pas6, Leszek Rychlewski6, Gilles Travé7, Rein Aasland5, Manuela Helmer-Citterich8, Rune Linding9 and Toby J. Gibson1,*

1Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, 2Biocomputing Group, Department of Biochemical Sciences, ‘A. Rossi-Fanelli’, Sapienza Universita’ di Roma, P.le Aldo Moro, 5, 00185 Rome, Italy, 3Computational Biology Unit, Bergen Centre for Computational Science, Høyteknologisenteret, Thormøhlensgate 55, 4Sars Centre for Marine Molecular Biology, University of Bergen, 5008 Bergen, 5Department of Molecular Biology, University of Bergen, HIB, Thormøhlensgt. 55, 5020 Bergen, Norway, 6BioInfoBank Institute, Limanowskiego 24A16 60-744, Poznan, Poland, 7ESBS, 1, Bld Sébastien Brandt, BP10413, 67412 Illkirch, France, 8Centre for Molecular Bioinformatics, Department of Biology, University of Rome ‘Tor Vergata’, Via della Ricerca Scientifica, 00133 Rome, Italy and 9Cellular & Molecular Logic Team, The Institute of Cancer Research (ICR), Section of Cell and Molecular Biology, SW3 6JB London, UK

*To whom correspondence should be addressed. Tel: +49 6221 387398; Fax: +49 6221 387517; Email: gibson@embl-heidelberg.de

Received September 14, 2009. Revised October 16, 2009. Accepted October 19, 2009.

Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a ‘Bar Code’ format, which also displays known instances from homologous proteins through a novel ‘Instance Mapper’ protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation.


The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

Present address: Christine Gemünd, Cellzome AG, Meyerhofstrasse 1, 69117 Heidelberg, Germany.

[FREE Full Text of Gould et al.] [Reprint (PDF) Version of Gould et al.]


MeMotif: a database of linear motifs in {alpha}-helical transmembrane proteins

Annalisa Marsico*, Kerstin Scheubert, Anne Tuukkanen, Andreas Henschel, Christof Winter, Rainer Winnenburg and Michael Schroeder

Bioinformatics Department, Biotechnology Center, TU Dresden, Tatzberg 47/49, 01307 Dresden, Germany

* To whom correspondence should be addressed. Tel: +49 (0) 35146340067; Email: annalisa@biotec.tu-dresden.de

Received September 1, 2009. Revised October 22, 2009. Accepted October 23, 2009.

Membrane proteins are important for many processes in the cell and used as main drug targets. The increasing number of high-resolution structures available makes for the first time a characterization of local structural and functional motifs in {alpha}-helical transmembrane proteins possible. MeMotif (http://projects.biotec.tu-dresden.de/memotif) is a database and wiki which collects more than 2000 known and novel computationally predicted linear motifs in {alpha}-helical transmembrane proteins. Motifs are fully described in terms of several structural and functional features and editable. Motifs contained in MeMotif can be used in different biological applications, from the identification of biochemically important functional residues which are candidates for mutagenesis experiments to the improvement of tools for transmembrane protein modeling.

[FREE Full Text of Marsico et al.] [Reprint (PDF) Version of Marsico et al.]


The Pfam protein families database

Robert D. Finn1,*, Jaina Mistry1, John Tate1, Penny Coggill1, Andreas Heger2, Joanne E. Pollington1, O. Luke Gavin1, Prasad Gunasekaran1, Goran Ceric3, Kristoffer Forslund4, Liisa Holm5, Erik L. L. Sonnhammer4, Sean R. Eddy3 and Alex Bateman1

1Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, 2Department of Physiology, Anatomy and Genetics, MRC Functional Genomics Unit, University of Oxford, Oxford, UK, 3Janelia Farm Research Campus, Howard Hughes Medical Institute, 19700 Helix Drive, Ashburn, VA 20147, USA, 4Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-10691 Stockholm, Sweden and 5Institute of Biotechnology and Department of Biological and Environmental Sciences, University of Helsinki, PO Box 56 (Viikinkaari 5), 00014 Helsinki, Finland

*To whom correspondence should be addressed. Tel: +44 1223 495330; Fax: +44 1223 494919; Email: rdf@sanger.ac.uk

Received October 12, 2009. Accepted October 15, 2009.

Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is ~100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11 912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

[FREE Full Text of Finn et al.] [Reprint (PDF) Version of Finn et al.]


3DNALandscapes: a database for exploring the conformational features of DNA

Guohui Zheng1, Andrew V. Colasanti1, Xiang-Jun Lu1,2 and Wilma K. Olson1,*

1Department of Chemistry & Chemical Biology, BioMaPS Institute for Quantitative Biology, Rutgers, the State University of New Jersey, Wright-Rieman Laboratories, 610 Taylor Road, Piscataway, NJ 08854 and 2Department of Biological Sciences, Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10027, USA

*To whom correspondence should be addressed. Tel: +1 732 445 3993; Fax: +1 732 445 5958; Email: wilma.olson@rutgers.edu

Received August 15, 2009. Revised October 10, 2009. Accepted October 13, 2009.

3DNALandscapes, located at: http://3DNAscapes.rutgers.edu, is a new database for exploring the conformational features of DNA. In contrast to most structural databases, which archive the Cartesian coordinates and/or derived parameters and images for individual structures, 3DNALandscapes enables searches of conformational information across multiple structures. The database contains a wide variety of structural parameters and molecular images, computed with the 3DNA software package and known to be useful for characterizing and understanding the sequence-dependent spatial arrangements of the DNA sugar-phosphate backbone, sugar-base side groups, base pairs, base-pair steps, groove structure, etc. The data comprise all DNA-containing structures—both free and bound to proteins, drugs and other ligands—currently available in the Protein Data Bank. The web interface allows the user to link, report, plot and analyze this information from numerous perspectives and thereby gain insight into DNA conformation, deformability and interactions in different sequence and structural contexts. The data accumulated from known, well-resolved DNA structures can serve as useful benchmarks for the analysis and simulation of new structures. The collective data can also help to understand how DNA deforms in response to proteins and other molecules and undergoes conformational rearrangements.

[FREE Full Text of Zheng et al.] [Reprint (PDF) Version of Zheng et al.]


NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure

Douglas H. Turner1 and David H. Mathews2,*

1Department of Chemistry and Center for RNA Biology, Box 0216, University of Rochester, Rochester, NY 14627-0216 and 2Department of Biochemistry and Biophysics and Center for RNA Biology, Box 712, University of Rochester Medical Center, Rochester, NY 14642, USA

*To whom correspondence should be addressed. Tel: +1 585 275 1734; Fax: +1 585 275 6007; Email: david_mathews@urmc.rochester.edu

Received August 17, 2009. Revised October 4, 2009. Accepted October 6, 2009.

The Nearest Neighbor Database (NNDB, http://rna.urmc.rochester.edu/NNDB) is a web-based resource for disseminating parameter sets for predicting nucleic acid secondary structure stabilities. For each set of parameters, the database includes the set of rules with descriptive text, sequence-dependent parameters in plain text and html, literature references to experiments and usage tutorials. The initial release covers parameters for predicting RNA folding free energy and enthalpy changes.

[FREE Full Text of Turner and Mathews] [Reprint (PDF) Version of Turner and Mathews]


ComSin: database of protein structures in bound (complex) and unbound (single) states in relation to their intrinsic disorder

Michail Yu. Lobanov1, Benjamin A. Shoemaker2, Sergiy O. Garbuzynskiy1, Jessica H. Fong2, Anna R. Panchenko2 and Oxana V. Galzitskaya1,*

1Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region, Russia and 2National Center for Biotechnology Information, NIH, Bethesda, MD, USA

*To whom correspondence should be addressed. Tel/Fax: +7-495-6327871; Email: ogalzit@vega.protres.ru

Received August 14, 2009. Revised October 9, 2009. Accepted October 13, 2009.

Most of the proteins in a cell assemble into complexes to carry out their function. In this work, we have created a new database (named ComSin) of protein structures in bound (complex) and unbound (single) states to provide a researcher with exhaustive information on structures of the same or homologous proteins in bound and unbound states. From the complete Protein Data Bank (PDB), we selected 24 910 pairs of protein structures in bound and unbound states, and identified regions of intrinsic disorder. For 2448 pairs, the proteins in bound and unbound states are identical, while 7129 pairs have sequence identity 90% or larger. The developed server enables one to search for proteins in bound and unbound states with several options including sequence similarity between the corresponding proteins in bound and unbound states, and validation of interaction interfaces of protein complexes. Besides that, through our web server, one can obtain necessary information for studying disorder-to-order and order-to-disorder transitions upon complex formation, and analyze structural differences between proteins in bound and unbound states. The database is available at http://antares.protres.ru/comsin/.

[FREE Full Text of Lobanov et al.] [Reprint (PDF) Version of Lobanov et al.]


fPOP: footprinting functional pockets of proteins by comparative spatial patterns

Yan Yuan Tseng1,*, Z. Jeffrey Chen2 and Wen-Hsiung Li1,3

1Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, 2Center for Computational Biology and Bioinformatics, University of Texas at Austin, One University Station, C4500, Austin, TX 78712, USA and 3Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan

*To whom correspondence should be addressed. Tel: +1 773 834 3965; Fax: +1 773 702 9740; Email: ytseng3@uchicago.edu

Received August 11, 2009. Revised September 21, 2009. Accepted October 6, 2009.

fPOP (footprinting Pockets Of Proteins, http://pocket.uchicago.edu/fpop/) is a relational database of the protein functional surfaces identified by analyzing the shapes of binding sites in ~42 700 structures, including both holo and apo forms. We previously used a purely geometric method to extract the spatial patterns of functional surfaces (split pockets) in ~19 000 bound structures and constructed a database, SplitPocket (http://pocket.uchicago.edu/). These functional surfaces are now used as spatial templates to predict the binding surfaces of unbound structures. To conduct a shape comparison, we use the Smith–Waterman algorithm to footprint an unbound pocket fragment with those of the functional surfaces in SplitPocket. The pairwise alignment of the unbound and bound pocket fragments is used to evaluate the local structural similarity via geometric matching. The final results of our large-scale computation, including ~90 000 identified or predicted functional surfaces, are stored in fPOP. This database provides an easily accessible resource for studying functional surfaces, assessing conformational changes between bound and unbound forms and analyzing functional divergence. Moreover, it may facilitate the exploration of the physicochemical textures of molecules and the inference of protein function. Finally, our approach provides a framework for classification of proteins into families on the basis of their functional surfaces.

[FREE Full Text of Tseng et al.] [Reprint (PDF) Version of Tseng et al.]


IMGT/3Dstructure-DB and IMGT/DomainGapAlign: a database and a tool for immunoglobulins or antibodies, T cell receptors, MHC, IgSF and MhcSF

François Ehrenmann1, Quentin Kaas1 and Marie-Paule Lefranc1,2,*

1IMGT®, the international imMunoGeneTics information system®, Université Montpellier 2, Laboratoire d’I;mmunoGénétique Moléculaire LIGM, Institut de Génétique Humaine IGH, UPR CNRS 1142, 141 rue de la Cardonille, 34396 Montpellier cedex 5 and 2Institut Universitaire de France, 103 Bd St Michel, 75005 Paris, France

*To whom correspondence should be addressed. Tel: +33 4 99 61 99 65; Fax: +33 4 99 61 99 01; Email: marie-paule.lefranc@igh.cnrs.fr

Received September 11, 2009. Revised October 9, 2009. Accepted October 12, 2009.

IMGT/3Dstructure-DB is the three-dimensional (3D) structure database of IMGT®, the international ImMunoGenetics information system® that is acknowledged as the global reference in immunogenetics and immunoinformatics. IMGT/3Dstructure-DB contains 3D structures of immunoglobulins (IG) or antibodies, T cell receptors (TR), major histocompatibility complex (MHC) proteins, antigen receptor/antigen complexes (IG/Ag, TR/peptide/MHC) of vertebrates; 3D structures of related proteins of the immune system (RPI) of vertebrates and invertebrates, belonging to the immunoglobulin and MHC superfamilies (IgSF and MhcSF, respectively) and found in complexes with IG, TR or MHC. IMGT/3Dstructure-DB data are annotated according to the IMGT criteria, using IMGT/DomainGapAlign, and based on the IMGT-ONTOLOGY concepts and axioms. IMGT/3Dstructure-DB provides IMGT gene and allele identification (CLASSIFICATION), region and domain delimitations (DESCRIPTION), amino acid positions according to the IMGT unique numbering (NUMEROTATION) that are used in IMGT/3Dstructure-DB cards, results of contact analysis and renumbered flat files. In its Web version, the IMGT/DomainGapAlign tool analyses amino acid sequences, per domain. Coupled to the IMGT/Collier-de-Perles tool, it provides an invaluable help for antibody engineering and humanization design based on complementarity determining region (CDR) grafting as it precisely defines the standardized framework regions (FR-IMGT) and CDR-IMGT. IMGT/3Dstructure-DB and IMGT/DomainGapAlign are freely available at http://www.imgt.org.


Present address: Quentin Kaas, The University of Queensland, Institute for Molecular Bioscience, Brisbane QLD 4072, Australia.

[FREE Full Text of Ehrenmann et al.] [Reprint (PDF) Version of Ehrenmann et al.]


PDBe: Protein Data Bank in Europe

S. Velankar*, C. Best, B. Beuth, C. H. Boutselakis, N. Cobley, A. W. Sousa Da Silva, D. Dimitropoulos, A. Golovin, M. Hirshberg, M. John, E. B. Krissinel, R. Newman, T. Oldfield, A. Pajon, C. J. Penkett, J. Pineda-Castillo, G. Sahni, S. Sen, R. Slowley, A. Suarez-Uruena, J. Swaminathan, G. van Ginkel, W. F. Vranken, K. Henrick and G. J. Kleywegt*

Protein Databank in Europe, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

*To whom correspondence should be addressed. Tel: +44 1223 494646; Fax: +44 1223 494468; Email: sameer@ebi.ac.uk

Correspondence may also be addressed to G. J. Kleywegt. Tel: +44 1223 492663; Fax: +44 1223 494468; Email: gerard@ebi.ac.uk

Received September 15, 2009. Accepted October 7, 2009.

The Protein Data Bank in Europe (PDBe) (http://www.ebi.ac.uk/pdbe/) is actively working with its Worldwide Protein Data Bank partners to enhance the quality and consistency of the international archive of bio-macromolecular structure data, the Protein Data Bank (PDB). PDBe also works closely with its collaborators at the European Bioinformatics Institute and the scientific community around the world to enhance its databases and services by adding curated and actively maintained derived data to the existing structural data in the PDB. We have developed a new database infrastructure based on the remediated PDB archive data and a specially designed database for storing information on interactions between proteins and bound molecules. The group has developed new services that allow users to carry out simple textual queries or more complex 3D structure-based queries. The newly designed ‘PDBeView Atlas pages’ provide an overview of an individual PDB entry in a user-friendly layout and serve as a starting point to further explore the information available in the PDBe database. PDBe’s active involvement with the X-ray crystallography, Nuclear Magnetic Resonance spectroscopy and cryo-Electron Microscopy communities have resulted in improved tools for structure deposition and analysis.


Present address: R. Newman, A. Pajon, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK

[FREE Full Text of Velankar et al.] [Reprint (PDF) Version of Velankar et al.]


PDBselect 1992–2009 and PDBfilter-select

Sven Griep and Uwe Hobohm*

University of Applied Sciences Giessen, Bioinformatics, D-35390 Giessen, Germany

*To whom correspondence should be addressed. Tel: 0641 3092580; Fax: 0641 3092549; Email: uwe.hobohm@tg.fh-giessen.de

Received August 1, 2009. Accepted September 4, 2009.

PDBselect (http://bioinfo.tg.fh-giessen.de/pdbselect/) is a list of representative protein chains with low mutual sequence identity selected from the protein data bank (PDB) to enable unbiased statistics. The list increased from 155 chains in 1992 to more than 4500 chains in 2009. PDBfilter-select is an online service to generate user-defined selections.

[FREE Full Text of Griep and Hobohm] [Reprint (PDF) Version of Griep and Hobohm]


Protein Geometry Database: a flexible engine to explore backbone conformations and their relationships to covalent geometry

Donald S. Berkholz1, Peter B. Krenesky2, John R. Davidson2 and P. Andrew Karplus1,*

1Department of Biochemistry and Biophysics, Oregon State University, 2011 ALS and 2Open Source Lab, Oregon State University, B211 Kerr Admin, Corvallis OR 97331, USA

*To whom correspondence should be addressed. Tel: +1 541 737 3200; Fax: +1 541 737 0481; Email: karplusp@science.oregonstate.edu

Received August 14, 2009. Revised October 14, 2009. Accepted October 19, 2009.

The backbone bond lengths, bond angles, and planarity of a protein are influenced by the backbone conformation ({varphi},{Psi}), but no tool exists to explore these relationships, leaving this area as a reservoir of untapped information about protein structure and function. The Protein Geometry Database (PGD) enables biologists to easily and flexibly query information about the conformation alone, the backbone geometry alone, and the relationships between them. The capabilities the PGD provides are valuable for assessing the uniqueness of observed conformational or geometric features in protein structure as well as discovering novel features and principles of protein structure. The PGD server is available at http://pgd.science.oregonstate.edu/ and the data and code underlying it are freely available to use and extend.

[FREE Full Text of Berkholz et al.] [Reprint (PDF) Version of Berkholz et al.]


PTGL: a database for secondary structure-based protein topologies

Patrick May1,*, Annika Kreuchwig2, Thomas Steinke3 and Ina Koch4,5,*

1Max Planck Institute for Molecular Plant Physiology, Bioinformatics, Am Muehlenberg 1, 14476 Potsdam-Golm, 2Leibniz-Institut fuer Molekulare Pharmakologie, Structural Bioinformatics, Robert-Roessle-Strasse 10, 13125 Berlin, 3Zuse Institute Berlin, Computer Science Research, Takustrasse 7, 14195 Berlin, 4Beuth University for Technology Berlin, FB VI, Bioinformatics, Seestrasse 64, 13347 Berlin and 5Max Planck Institute for Molecular Genetics, Computational Molecular Biology, Ihnestrasse 73, 14195 Berlin, Germany

*To whom correspondence should be addressed. Fax: +49 (0)331 5678136; Email: may@mpimp-golm.mpg.de

Correspondence may also be addressed to Ina Koch. Tel: +49 (0)30/8413-1168; Fax: +49 (0)30/8413-1152; Email: koch_i@molgen.mpg.de

Received August 15, 2009. Revised October 14, 2009. Accepted October 15, 2009.

With growing amount of experimental data, the number of known protein structures also increases continuously. Classification of protein structures helps to understand relationships between protein structure and function. The main classification methods based on secondary structures are SCOP, CATH and TOPS, which all classify under different aspects, and therefore can lead to different results. We developed a mathematically unique representation of protein structure topologies at a higher abstraction level providing new aspects of classification and enabling for a fast search through the data. Protein Topology Graph Library (PTGL; http://ptgl.zib.de) aims at providing a database on protein secondary structure topologies, including search facilities, the visualization as intuitive topology diagrams as well as in the 3D structure, and additional information. Secondary structure-based protein topologies are represented uniquely as undirected labeled graphs in four different ways allowing for exploration under different aspects. The linear notations, and the 2D and 3D diagrams of each notation facilitate a deeper understanding of protein topologies. Several search functions for topologies and sub-topologies, BLAST search possibility, and links to SCOP, CATH and PDBsum support individual and large-scale investigation of protein structures. Currently, PTGL comprises topologies of 54 859 protein structures. Main structural patterns for common structural motifs like TIM-barrel or Jelly Roll are pre-implemented, and can easily be searched.

[FREE Full Text of May et al.] [Reprint (PDF) Version of May et al.]


CORUM: the comprehensive resource of mammalian protein complexes—2009

Andreas Ruepp1,*, Brigitte Waegele1,2, Martin Lechner1, Barbara Brauner1, Irmtraud Dunger-Kaltenbach1, Gisela Fobo1, Goar Frishman1, Corinna Montrone1 and H.-Werner Mewes1,2

1Institute for Bioinformatics and Systems Biology (IBIS), Helmholtz Zentrum München—German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764 Neuherberg and 2Technische Universität München, Chair of Genome Oriented Bioinformatics, Center of Life and Food Science, D-85350 Freising-Weihenstephan, Germany

*To whom correspondence should be addressed. Tel: +49 3187 3189; Fax: +49 3187 3585; Email: andreas.ruepp@helmholtz-muenchen.de

Received September 10, 2009. Accepted October 7, 2009.

CORUM is a database that provides a manually curated repository of experimentally characterized protein complexes from mammalian organisms, mainly human (64%), mouse (16%) and rat (12%). Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The new CORUM 2.0 release encompasses 2837 protein complexes offering the largest and most comprehensive publicly available dataset of mammalian protein complexes. The CORUM dataset is built from 3198 different genes, representing ~16% of the protein coding genes in humans. Each protein complex is described by a protein complex name, subunit composition, function as well as the literature reference that characterizes the respective protein complex. Recent developments include mapping of functional annotation to Gene Ontology terms as well as cross-references to Entrez Gene identifiers. In addition, a ‘Phylogenetic Conservation’ analysis tool was implemented that analyses the potential occurrence of orthologous protein complex subunits in mammals and other selected groups of organisms. This allows one to predict the occurrence of protein complexes in different phylogenetic groups. CORUM is freely accessible at (http://mips.helmholtz-muenchen.de/genre/proj/corum/index.html).

[FREE Full Text of Ruepp et al.] [Reprint (PDF) Version of Ruepp et al.]


GWIDD: Genome-wide protein docking database

Petras J. Kundrotas1, Zhengwei Zhu1 and Ilya A. Vakser1,2,*

1Center for Bioinformatics and 2Department of Molecular Biosciences, 2030 Becker Drive, The University of Kansas, Lawrence, KS 66047, USA

*To whom correspondence should be addressed. Tel: +1 785 584 1057; Fax: +1 785 864 5558; Email: vakser@ku.edu

Received August 10, 2009. Revised September 27, 2009. Accepted October 12, 2009.

Structural information on interacting proteins is important for understanding life processes at the molecular level. Genome-wide docking database is an integrated resource for structural studies of protein–protein interactions on the genome scale, which combines the available experimental data with models obtained by docking techniques. Current database version (August 2009) contains 25 559 experimental and modeled 3D structures for 771 organisms spanned over the entire universe of life from viruses to humans. Data are organized in a relational database with user-friendly search interface allowing exploration of the database content by a number of parameters. Search results can be interactively previewed and downloaded as PDB-formatted files, along with the information relevant to the specified interactions. The resource is freely available at http://gwidd.bioinformatics.ku.edu.

[FREE Full Text of Kundrotas et al.] [Reprint (PDF) Version of Kundrotas et al.]


Inferred Biomolecular Interaction Server—a web server to analyze and predict protein interacting partners and binding sites

Benjamin A. Shoemaker, Dachuan Zhang, Ratna R. Thangudu, Manoj Tyagi, Jessica H. Fong, Aron Marchler-Bauer, Stephen H. Bryant, Thomas Madej* and Anna R. Panchenko*

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA

*To whom correspondence should be addressed. Tel: +1 301 435 5891; Fax: +1 301 480 4637; Email: panch@ncbi.nlm.nih.gov Correspondence may also be addressed to Thomas Madej. Tel: +1 301 435 5998; Fax: +1 301 480 4637; Email: madej@ncbi.nlm.nih.gov

Received August 15, 2009. Revised September 16, 2009. Accepted September 21, 2009.

IBIS is the NCBI Inferred Biomolecular Interaction Server. This server organizes, analyzes and predicts interaction partners and locations of binding sites in proteins. IBIS provides annotations for different types of binding partners (protein, chemical, nucleic acid and peptides), and facilitates the mapping of a comprehensive biomolecular interaction network for a given protein query. IBIS reports interactions observed in experimentally determined structural complexes of a given protein, and at the same time IBIS infers binding sites/interacting partners by inspecting protein complexes formed by homologous proteins. Similar binding sites are clustered together based on their sequence and structure conservation. To emphasize biologically relevant binding sites, several algorithms are used for verification in terms of evolutionary conservation, biological importance of binding partners, size and stability of interfaces, as well as evidence from the published literature. IBIS is updated regularly and is freely accessible via http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.html.

[FREE Full Text of Shoemaker et al.] [Reprint (PDF) Version of Shoemaker et al.]


The IntAct molecular interaction database in 2010

B. Aranda1, P. Achuthan1, Y. Alam-Faruque1, I. Armean, A. Bridge2, C. Derow1, M. Feuermann2, A. T. Ghanbarian1, S. Kerrien1, J. Khadake1, J. Kerssemakers1, C. Leroy1, M. Menden1, M. Michaut1, L. Montecchi-Palazzi1, S. N. Neuhauser1, S. Orchard1, V. Perreau3, B. Roechert2, K. van Eijk1 and H. Hermjakob1,*

1EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK, 2Swiss Institute of Bioinformatics Geneva, Switzerland and 3Centre for Neuroscience, University of Melbourne, Australia

*To whom correspondence should be addressed. Tel: +44 1223 49 4671; Fax: +44 1223 49 4468; Email: hhe@ebi.ac.uk

Received September 11, 2009. Revised October 1, 2009. Accepted October 2, 2009.

IntAct is an open-source, open data molecular interaction database and toolkit. Data is abstracted from the literature or from direct data depositions by expert curators following a deep annotation model providing a high level of detail. As of September 2009, IntAct contains over 200.000 curated binary interaction evidences. In response to the growing data volume and user requests, IntAct now provides a two-tiered view of the interaction data. The search interface allows the user to iteratively develop complex queries, exploiting the detailed annotation with hierarchical controlled vocabularies. Results are provided at any stage in a simplified, tabular view. Specialized views then allows ‘zooming in’ on the full annotation of interactions, interactors and their properties. IntAct source code and data are freely available at http://www.ebi.ac.uk/intact.

[FREE Full Text of Aranda et al.] [Reprint (PDF) Version of Aranda et al.]


MINT, the molecular interaction database: 2009 update

Arnaud Ceol1,*, Andrew Chatr Aryamontri1, Luana Licata1, Daniele Peluso2, Leonardo Briganti1, Livia Perfetto1, Luisa Castagnoli1 and Gianni Cesareni1,2

1Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 00133 and 2IRCCS Fondazione Santa Lucia, 00143 Rome

*To whom correspondence should be addressed. Tel: +39 067 2594315; Fax: +39 062 023500; Email: arnaud.ceol@uniroma2.it

Received September 11, 2009. Revised October 13, 2009. Accepted October 15, 2009.

MINT (http://mint.bio.uniroma2.it/mint) is a public repository for molecular interactions reported in peer-reviewed journals. Since its last report, MINT has grown considerably in size and evolved in scope to meet the requirements of its users. The main changes include a more precise definition of the curation policy and the development of an enhanced and user-friendly interface to facilitate the analysis of the ever-growing interaction dataset. MINT has adopted the PSI-MI standards for the annotation and for the representation of molecular interactions and is a member of the IMEx consortium.


The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

[FREE Full Text of Ceol et al.] [Reprint (PDF) Version of Ceol et al.]


The Negatome database: a reference set of non-interacting protein pairs

Pawel Smialowski1,2, Philipp Pagel1,2, Philip Wong1, Barbara Brauner1, Irmtraud Dunger1, Gisela Fobo1, Goar Frishman1, Corinna Montrone1, Thomas Rattei2, Dmitrij Frishman1,2,* and Andreas Ruepp1

1Institute for Bioinformatics and Systems Biology/MIPS, HMGU—German Research Center for Environmental Health Ingolstaedter Landstrasse 1, 85764 Neuherberg and 2Germany Department of Genome Oriented Bioinformatics, Technische Universitaet Muenchen Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany

*To whom correspondence should be addressed. Tel: +49 8161 712134; Fax: +49 8161 712186; Email: d.frishman@wzw.tum.de

Received October 14, 2009. Revised October 19, 2009. Accepted October 20, 2009.

The Negatome is a collection of protein and domain pairs that are unlikely to be engaged in direct physical interactions. The database currently contains experimentally supported non-interacting protein pairs derived from two distinct sources: by manual curation of literature and by analyzing protein complexes with known 3D structure. More stringent lists of non-interacting pairs were derived from these two datasets by excluding interactions detected by high-throughput approaches. Additionally, non-interacting protein domains have been derived from the stringent manual and structural data, respectively. The Negatome is much less biased toward functionally dissimilar proteins than the negative data derived by randomly selecting proteins from different cellular locations. It can be used to evaluate protein and domain interactions from new experiments and improve the training of interaction prediction algorithms. The Negatome database is available at http://mips.helmholtz-muenchen.de/proj/ppi/negatome.


The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

[FREE Full Text of Smialowski et al.] [Reprint (PDF) Version of Smialowski et al.]


PepX: a structural database of non-redundant protein–peptide complexes

Peter Vanhee1,2, Joke Reumers1,2, Francois Stricher3, Lies Baeten1,2, Luis Serrano3, Joost Schymkowitz1,2,* and Frederic Rousseau1,2,*

1VIB SWITCH Laboratory, 2Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium, 3EMBL-CRG Systems Biology Unit, CRG-Centre de Regulacio Genomica, Dr Aiguader 88, 08003 Barcelona and 4ICREA. Institucio Catala de Recerca i Estudis Avancats. Passeig Lluís Companys, 23 08010 Barcelona, Spain

*To whom correspondence should be addressed. Tel: +32 2 629 14 25; Fax: +32 2 629 19 63; Email: joost.schymkowitz@switch.vib-vub.be

Correspondence may also be addressed to Frederic Rousseau. Email: frederic.rousseau@switch.vib-vub.be

Received August 15, 2009. Revised October 1, 2009. Accepted October 6, 2009.

Although protein–peptide interactions are estimated to constitute up to 40% of all protein interactions, relatively little information is available for the structural details of these interactions. Peptide-mediated interactions are a prime target for drug design because they are predominantly present in signaling and regulatory networks. A reliable data set of nonredundant protein–peptide complexes is indispensable as a basis for modeling and design, but current data sets for protein–peptide interactions are often biased towards specific types of interactions or are limited to interactions with small ligands. In PepX (http://pepx.switchlab.org), we have designed an unbiased and exhaustive data set of all protein–peptide complexes available in the Protein Data Bank with peptide lengths up to 35 residues. In addition, these complexes have been clustered based on their binding interfaces rather than sequence homology, providing a set of structurally diverse protein–peptide interactions. The final data set contains 505 unique protein–peptide interface clusters from 1431 complexes. Thorough annotation of each complex with both biological and structural information facilitates searching for and browsing through individual complexes and clusters. Moreover, we provide an additional source of data for peptide design by annotating peptides with naturally occurring backbone variations using fragment clusters from the BriX database.

[FREE Full Text of Vanhee et al.] [Reprint (PDF) Version of Vanhee et al.]


Protein Structure Initiative Material Repository: an open shared public resource of structural genomics plasmids for the biological community

Catherine Y. Cormier1,2,3, Stephanie E. Mohr3,4, Dongmei Zuo2,3, Yanhui Hu2,3, Andreas Rolfs2,3, Jason Kramer1,2,3, Elena Taycher2,3, Fontina Kelley2,3, Michael Fiacco1,2, Greggory Turnbull1 and Joshua LaBaer1,2,3,*

1Arizona State University, Biodesign Institute, Virginia G. Piper Center for Personalized Diagnostics, 1001 S. McAllister Dr. Tempe, AZ 85287-6401, 2Protein Structure Initiative Material Repository, Arizona State University, 1001 S. McAllister Dr. Tempe, AZ 85287-6401, 3Harvard Medical School, 240 Longwood Avenue, Boston, 4Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA

*To whom correspondence should be addressed. Tel: +1 480 965 2805; Fax: +1 480 965 3051; Email: jlabaer@asu.edu

Received September 4, 2009. Revised October 15, 2009. Accepted October 16, 2009.

The Protein Structure Initiative Material Repository (PSI-MR; http://psimr.asu.edu) provides centralized storage and distribution for the protein expression plasmids created by PSI researchers. These plasmids are a resource that allows the research community to dissect the biological function of proteins whose structures have been identified by the PSI. The plasmid annotation, which includes the full length sequence, vector information and associated publications, is stored in a freely available, searchable database called DNASU (http://dnasu.asu.edu). Each PSI plasmid is also linked to a variety of additional resources, which facilitates cross-referencing of a particular plasmid to protein annotations and experimental data. Plasmid samples can be requested directly through the website. We have also developed a novel strategy to avoid the most common concern encountered when distributing plasmids namely, the complexity of material transfer agreement (MTA) processing and the resulting delays this causes. The Expedited Process MTA, in which we created a network of institutions that agree to the terms of transfer in advance of a material request, eliminates these delays. Our hope is that by creating a repository of expression-ready plasmids and expediting the process for receiving these plasmids, we will help accelerate the accessibility and pace of scientific discovery.

[FREE Full Text of Cormier et al.] [Reprint (PDF) Version of Cormier et al.]


BioNumbers—the database of key numbers in molecular and cell biology

Ron Milo1,*, Paul Jorgensen2,3, Uri Moran1, Griffin Weber4 and Michael Springer2

1Department of Plant Sciences, Weizmann Institute of Science, Rehovot 76100, Israel, 2Department of Systems Biology, Harvard Medical School, Boston, MA 02445, USA, 3Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada M5S 3E1 and 4Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA

*To whom correspondence should be addressed. Tel: +97289344466; Fax: +97289344181; Email: ron.milo@weizmann.ac.il

Received July 16, 2009. Accepted October 2, 2009.

BioNumbers (http://www.bionumbers.hms.harvard.edu) is a database of key numbers in molecular and cell biology—the quantitative properties of biological systems of interest to computational, systems and molecular cell biologists. Contents of the database range from cell sizes to metabolite concentrations, from reaction rates to generation times, from genome sizes to the number of mitochondria in a cell. While always of importance to biologists, having numbers in hand is becoming increasingly critical for experimenting, modeling, and analyzing biological systems. BioNumbers was motivated by an appreciation of how long it can take to find even the simplest number in the vast biological literature. All numbers are taken directly from a literature source and that reference is provided with the number. BioNumbers is designed to be highly searchable and queries can be performed by keywords or browsed by menus. BioNumbers is a collaborative community platform where registered users can add content and make comments on existing data. All new entries and commentary are curated to maintain high quality. Here we describe the database characteristics and implementation, demonstrate its use, and discuss future directions for its development.

[FREE Full Text of Milo et al.] [Reprint (PDF) Version of Milo et al.]

Share Me:
  • email
  • Print
  • Twitter
  • Digg
  • Google Bookmarks
  • del.icio.us
  • Technorati
  • Facebook
  • connotea
  • Mixx
  • Reddit
  • StumbleUpon
  • NewsVine
  • FriendFeed
  • LinkedIn
  • Yahoo! Bookmarks
  • RSS
Written by Nir London in: Literature Reviews,Resources,Title Madness | Tags: ,

Related posts

Enjoyed this Post ?

Subscribe by E-mail:

Subscribe in a reader. Follow us on twitter.

1 Comment »

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress | Aeros Theme | TheBuckmaker.com WordPress Themes
© 2009 Rosetta Design Group LLC