SnugDock: Paratope structural optimization during antibody-antigen docking compensates for errors in homology models
Aroop Sircar, GrayLab, Johns Hopkins University
High resolution structures of antibody-antigen complexes are useful for analyzing the binding interface and to make rational choices for antibody engineering. When a crystallographic structure of a complex is unavailable, the structure must be predicted using computational tools. In this work, we illustrate a novel approach, named SnugDock, to predict high resolution antibody-antigen complex structures by simultaneously structurally optimizing the antibody-antigen rigid body positions, the relative orientation of the antibody light and heavy chains, and the conformations of the six complementarity determining region loops. This approach is especially useful when the crystal structure of the antibody is not available, requiring allowances for inaccuracies in an antibody homology model which would otherwise frustrate rigid-backbone high-resolution docking predictions. Local docking using SnugDock with the lowest-energy RosettaAntibody homology model produced more accurate predictions (6 of 15 with more than 50% fnat and 6/15 with irms < 2.5 �) than standard rigid body docking. SnugDock can be combined with ensemble docking to mimic conformer selection and induced fit resulting in increased diversity of antibody conformations sampled. The combined algorithm produced four medium (Critical Assessment of PRediction of Interactions-CAPRI rating) and seven acceptable lowest-energy predictions in a test set of fifteen complexes (2 of 15 with more than 50% fnat and 5/15 with irms < 2.5 �). Performance of SnugDock can also achieve robust, successful predictions using homology models from the Web Antibody Modeling or Prediction of Immunoglobulin Structure servers. The accuracy of SnugDock predictions suggests a new genre of general docking algorithms with flexible binding interfaces targeted towards making homology models useful for further high-resolution predictions.
Identification of structural mechanisms of HIV-1 protease specificity using computational peptide docking and implications for drug resistance
Sid Chaudhury, GrayLab, Johns Hopkins University
Inhibitors that target human immunodeficiency virus type 1 (HIV-1) protease and disrupt viral assembly and maturation are critical to a number of antiretroviral therapies, but their efficacy is limited by the ability of HIV-1 to develop drug resistant mutations (DRMs) in the protease gene. Protease active-site residue interactions that are determined to be critical for native substrate selectivity could serve as robust targets for drug design that are immune to DRMs. Towards that end, we developed a novel peptide docking algorithm to predict the structure of protease-substrate complexes in atomic detail and observe the underlying structural and energetic mechanisms that guide selectivity. We generated structural models for a diverse set of 69 known cleavable peptides and 43 non-cleavable peptides. The total energy, taken as the sum of the individual residue energies for substrate peptide and active-site residues revealed a statistically significant difference (p < 10-6) between cleavable and non-cleavable peptides. Further analyses revealed that while 34 active-site residues were involved in substrate binding, only six were primarily responsible for specificity. Surprisingly, all six residues correspond to sequence positions associated with drug resistance mutations including D30, I47, G48, L76, V82, and I84. These results demonstrate that the very residues that are responsible for native substrate specificity in HIV-1 protease are altered during its evolution to drug resistance, suggesting that drug resistance and substrate selectivity may share common mechanisms. We expand on the substrate envelope hypothesis to accommodate both specificity and promiscuity and discuss its implications for drug resistance.
Alpha helical crossovers favor right-handed supersecondary structures by a kinetic trapping mechanism. The phone cord effect in protein folding
Chris Bystroff, Dept of Biology, Rensselaer Polytechnic Institute, Troy, NY
When a polypeptide chain forms an alpha helix, the ends of the chain must rotate to dissipate the accumulated torque, reminiscent of the way a phone cord forms superhelices to relieve the torsional stress of forming a helix. The rotation of the ends is in the clockwise direction, leading to crossover conformations that are right-handed, the predominant handedness of helical crossovers in protein structures. This rope-like behavior stems from the high energetic barriers to bond rotations in polypeptide backbones. The rotational forces generated by this "phone cord effect" can explain how some proteins form knots.
Cole BJ & Bystroff C. (2009) Alpha helical crossovers favor right-handed supersecondary structures by a kinetic trapping mechanism. The phone cord effect in protein folding. Protein Science [Epub july 2009] http://www.bioinfo.rpi.edu/bystrc/phonecord/
Genome Annotation using Structure Prediction Techniques
Kevin Drew, Bonneau Lab, New York University
Genome annotation currently is based on transferring what is known about well studied genes to homologous genes in newly sequenced genomes. The basis of this transfer lies in the observation that genes with similar sequences have similar functional roles in the cell. This basis begins to break down at high levels of sequence divergence (<20% sequence identity) and therefore leaves a large fraction of genomes unannotated. Structural homology, proteins with similar 3-dimensional structures, is conserved longer over evolutionary time and can be used to extend coverage of genome annotation efforts. SCOP superfamiles were determined for proteins from > 80 genomes including most model organisms by fold recognition and de novo prediction techniques. A na�ve Bayes method was then used to determine molecular function based on SCOP superfamily and other evidence such as GO cellular component and biological process.
Predicting Temperature-Sensitive Mutations
Chris Poultney, Bonneau Lab, New York University
Description Pending.
HIV Gp-160 Targeted Broad Neutralizing Antibodies - Modeling and Design
Jordan Willis, Meiler Lab, Vanderbilt University
To date only four broad neutralizing antibodies have been characterized that target HIV envelope protein gp160, the only available surface target of the virus. It is hypothesized that the selective pressure involved in evolving these antibodies to maturity is limited to turning on biochemical signals that allow the parent b-cells to proliferate. This gives a potency of roughly 10-9Kd. Using the dock design scripter, we can characterize various scores involved in the high resolution interactions at the antibody antigen interface. We can then determine an appropriate weight set for the broad neutralizing antibodies that give it neutralizing ability. We can then design design mutations in the dock design scripter that can regain this nuetralizing ability for escape mutants of gp-160 based on the weight set that has been previously determined.
BCL::EM-Fold and BCL::EM-Fit: Protein Folding and Fitting tools for medium resolution cryoEM density maps
Steffen Lindert, Nils Woetzel, Meiler Lab, Vanderbilt University
Using cryo-electron microscopy (cryoEM) numerous medium resolution (6 � 12 �) density maps of large macromolecular assemblies have been reported. Generally no atomic detail is resolved in these density maps, making it impossible to deduce the protein structure from the density map alone. Combining these maps with computational algorithms that are tailored for the specific map can help elucidate the protein structure. We wish to present novel protein fitting and folding algorithms that aid in the interpretation of medium resolution cryoEM density maps.
If an atomic model (crystal structure or comparative model) for the protein structure exists, BCL::EM-Fit accurately and rapidly fits atomic-detail structural models into medium resolution density maps. In an initial step a "geometric hashing" feature recognition algorithm rapidly provides a short list of likely placements. In a follow up Monte Carlo/Metropolis refinement step the initial placements are optimized in their cross correlation coefficient. The resolution of density maps for a reliable fit was determined to be 11 � or better. The algorithm was applied to fitting of capsid proteins into an experimental cryoEM density map of human adenovirus at a resolution of 6.9 �. In the process, the handedness of the cryoEM density map was unambiguously identified. The algorithm is at least three times faster than established Fourier/Real space fitting programs.
If no atomic model is available for fitting but secondary structure elements (particularly ?-helices are observed as density rods at sub-nanometer resolution) are visible in the density map, BCL::EM-Fold has been developed to fold protein models into the observed density rods. This is done by incorporating the experimental cryoEM data as restraints. The placement of helices is restricted to regions where density rods are observed in the cryoEM density map. The Monte Carlo based protein folding algorithm is further driven by knowledge based energy functions.
BCL::EM-Fold has been benchmarked with ten highly ?-helical proteins of known structure. The chosen proteins range in size from 250 to 350 residues. Starting with knowledge of the true secondary structure for these ten proteins, the method can identify the correct topology within the top scoring 10 models. With more realistic secondary structure prediction information, the correct topology is found within the top scoring 5 models for seven of the ten proteins. The algorithm has been applied to human adenovirus protein IIIa. This protein, for which there is no high resolution structure, is predicted to be highly ?-helical. It is resolved in a 6.8 � resolution cryoEM adenovirus structure as a bundle of 14 ?-helical density rods.
Developing Protein Structure Prediction Methods Assisted by EPR Experimental Data
Stephanie Hirst, Meiler Lab, Vanderbilt University
Site-Directed Spin Labeling Electron Paramagnetic Resonance (SDSL-EPR) in combination with the Rosetta protein folding algorithm (Rohl et al, 2004) could serve as an alternative method in structure elucidation of proteins that continue to evade traditional techniques, such as x-ray crystallography and NMR. A spin label �motion-on-a-cone� model was used during de novo folding of T4-lysozyme and ?A-crystallin, which resulted in full-atom models at 1.0� and 2.6� to the crystal structures, respectively (Alexander et al, 2008). This spin label model and already-existing EPR distance data have been used to generate a knowledge-based potential, which we plan to implement into Rosetta as a constraints function. In addition, we have introduced a rotamer library of the methanethiosulfonate spin label (MTSSL). Spin label rotamers have been derived from conformations observed in crystal structures of spin-labeled T4-lysozyme. The method was benchmarked using a set of proteins where the spin label was positioned at various levels of exposure. The results indicate that the method is able to recover important aspects of spin label orientation with up to 0.4� RMSD. In particular, experimental distances and distance distributions observed for T4-lysozyme were reproduced with relatively high accuracy.
Improving Protein Surface Design in Rosetta
Sam DeLuca, Steven Combs, Meiler Lab, Vanderbilt University
Experimental expression and characterization of TIM Barrels designed by Meiler et al. Using Rosetta suggests that while Rosetta is relatively effective at designing the core of proteins it fails to accurately design surfaces. TIM Barrels Designed and expressed using Rosetta had major deficiencies in folding stability. In an attempt to correct these errors, we have designed and benchmarked a new knowledge based surface potential which provides a bonus to residues which are frequently found at a given level of burial in the protein. The Knowledge Based potential was calculated as the -log(propensity) of amino acids at varying levels of burial in a database of approximately 1900 high resolution crystal structures in the pdb. Degree of burial is estimated by using the Neighbor Vector method, which calculates a weighted vector sum of the neighbors of each residue. This knowledge based potential provides a bonus to amino acids which are more likely in general to occur at a given degree of burial. We present benchmarks of this energy potential showing a 3-4% increase in sequence recovery over a range of proteins.
Web Server based Interface to Backrub Modeling & Design in Rosetta
Florian Lauck, Kortemme Lab, UCSF
In this project a web server based interface to Rosetta was created that provides access to several applications developed by the Kortemme lab. The code consists of two independent parts a front-end and a back-end. Both are written in python and were designed in a way that makes it easy to implement new applications. The front-end was first used as a new user interface for Interface Alanine Scanning, which is also part of the Robetta server. The new Kortemme Lab web-server will host a number of applications that utilize the backrub protocol to create single and multiple point mutations, simple ensembles, ensembles that resemble NMR measurements and flexible backbone library design. Other applications are planned as well.
SAT-based Protein Design
Noah Ollikainen, Kortemme Lab, UCSF
Computational protein design can be formulated as an optimization problem, where the objective is to identify the sequence of amino acids that minimizes the energy of a given protein structure. We propose a novel search-based approach that utilizes a Boolean function to encode the solution space where the function's onset represents the sequences considered during the search. We first present a dead-end-elimination (DEE) based method for the initial setup of the Boolean function and then describe a branch-and-bound algorithm that employs the search and deduction engine of a modern Boolean Satisfiability (SAT) solver. Its fast implication processing and conflict-based learning provide an efficient framework for the overall algorithm. Our results indicate that the presented approach can efficiently find the guaranteed optimum solution for protein core design problems. Furthermore, since our method is complete and symbolic, it can find all solutions that are within an epsilon-distance from the global minimum.This capability allows further analysis, such identifying common sequence patterns of close-to-optimum solutions. Lastly, the SAT-based encoding of the search space provides a flexible mechanism to take complex design constraints into account, such as enforcing dependencies for amino acid choices at different positions or optimizing a single amino acid sequence to be simultaneously consistent with multiple input structures.
Computational Protein-Peptide Docking Recapitulates Specificity of PDZ Domains
Joseph J. Crivelli, Andrew Morin, Kristian Kaufmann, Meiler Lab, Vanderbilt University
PDZ domains are protein-peptide interaction modules which play crucial roles in many biological pathways. Past studies have classified PDZ domains based on specificity of interaction. A computational protocol to predict PDZ domain specificity relies heavily on accurate prediction of binding energy (??G). For a subset of mutants of Erbin PDZ and their phage-derived C-terminal peptide binding partners, ROSETTA was utilized to predict binders and non-binders. Following relaxation and docking of all combinations of complexes, ??G was calculated. The energy term constituents of the ??G score were reweighted with a monte carlo optimization algorithm. Specificity profiles were then generated with the optimized weight set. It was determined that ROSETTA accurately predicts PDZ domain specificity and, to a limited degree, peptide promiscuity.
Computational Structural and Evolutionary analysis of the Neurotransmitter Sodium Symporter
Alexej Grjasnow, Kristian Kaufmann, Jens Meiler, Meiler Lab, Vanderbilt University
Description pending.
From Rat Brains to Rosetta: Computational Models of the Calcium/Calmodulin alpha-Actinin Complex
Kristian Kaufmann Jens Meiler, Meiler Lab, Vanderbilt University
Description Pending
Sequence recovery of protein/peptide and protein/small-molecule interfaces using Rosetta3
Andrew Morin, Joseph Crivelli, Kristian Kaufmann, Jens Meiler, Meiler Lab, Vanderbilt University
The prediction and design of protein/peptide and protein/small-molecule interfaces is an important but relatively unproven capability of Rosetta3. While significant investigation into designing protein/protein interfaces, altering protein/protein interaction specificity, engineering catalysis, predicting small-molecule binding affinity and designing small molecule pharmacophores has and is being addressed, the design of the protein interface to peptides and small-molecules constitutes something of a gap in Rosetta research. As an initial step into investigating Rosetta3�s proficiency at designing protein/peptide and protein/small-molecule interfaces, an extensive sequence recovery benchmark was performed on a diverse and representative set of liganded protein holostructures derived from the LPDB. A statistical examination was also made of the structural and sequence-specific properties of wild-type peptide and small-molecule binding interfaces. These wild-type interface propensities were then compared to Rosetta designed interfaces.
Comparative Modeling Applications of Rosetta at the Vanderbilt Center for Structural Biology
Eric Dawson, Jonathan Sheehan, Meiler Lab, Vanderbilt University
The Vanderbilt Center for Structural Biology fields many requests from researchers who wish to enhance their capabilities in computational biology. Our most frequent request is comparative model building and structural assessment of both comparative models and the structural templates on which they are based. As part of the Rosetta Commons, the research group of Jens Meiler strongly supports this effort and contributes emerging developments from ongoing Rosetta codebase development toward challenging model building and structural assessment projects. Here we present several examples that reflect the broad range of modeling projects featuring application of Rosetta model building, structural refinement, ligand docking and energetic analysis to membrane-bound and membrane-associated protein targets. Systems modeled include inward rectifying potassium channels (KiR/ROM-K), cytoplasmic domains of CLH3a/b chloride channels, and Class A G-protein coupled receptors including biogenic amine and nucleoside receptors.
Non-canonical Amino Acids in Protein/Peptide Interface Design
P. Douglas Renfrew, Brian Kuhlman
Description pending.
The structural basis of peptide-protein binding strategies
Nir London, Dana Attias, Ora Schueler-Furman, The Hebrew University
Peptide-protein interactions are among the most prevalent interactions in the cell. They mediate important processes, such as signal transduction and protein trafficking. How can peptides overcome the entropic cost involved in association, switching from an unstructured, flexible peptide to a rigid, well-defined structure at the interface? A structure-based analysis of peptide-protein interactions unravels that peptides use a number of strategies to compensate for this entropy loss. In particular, most peptides do not induce conformational changes on their partner upon binding, by this minimizing the entropic cost of binding. Furthermore, peptides optimize the binding enthalpy of the interaction: they display interfaces that are better packed than protein-protein interfaces, with significantly more hydrogen bonds (per constant interface size). In addition, they utilize their flexibility to the fullest in creating more interactions that involve main chain atoms. The distribution of binding energy along the peptide is not uniform; we find that on average one �hotspot� residue is required per three peptide residues. Finally, we show that peptides tend to bind in the largest pockets available on the protein surface. In addition to improved understanding of basic principles that underlie peptide-protein interactions, our findings have direct implications for the development of protocols for the structural modeling, design and manipulation of these interactions. This analysis is based on peptiDB, a new and comprehensive dataset of high-resolution peptide-protein complex structures.
Computational redesign of gp120 towards improved immunogenicity
Sergey Menis and Bill Schief, Schief Lab, UW
Attachment to the human CD4 receptor is the first step of cell entry by the human immunodeficiency virus type 1 (HIV-1). A major aim of HIV vaccine design is to elicit antibodies able to block this first step and thereby neutralize it. Three copies of gp120 and gp41 form a trimeric spike on the surface of the viral membrane; each of the gp120 monomers contains a CD4 binding site. Most antibodies elicited by monomeric gp120 do not neutralize because they bind immunodominant epitopes on gp120 that are occluded on the functional trimeric spike. We aim to design variants of monomeric gp120 to elicit neutralizing antibodies that target the CD4-binding site, by stabilizing the CD4-bound conformation and trimming away undesired epitopes. We are using a combination of flexible backbone protein design with Rosetta, and screening of directed libraries on the surface of yeast by fluorescence activated cell sorting. We removed most of the inner domain of gp120 and replaced it with a computationally optimized linker. A directed library spanning the computational design was screened on the surface of yeast for binding to the b12 antibody as a proxy for CD4. The enriched library and individual clones show low nM dissociation constants when titrated against b12 antibody.
Understanding neurotrophin specificity through flexible-backbone docking
Whitney Smith, John Karanicolas, University of Kansas
Description Pending.
Differentiable orientation-dependent potentials
Patrick Buck and Chris Bystroff, Rensselaer Polytechnic Institute, Troy, NY
Recently there has been an increased interest in off-lattice continuum based simulation for protein folding, which may do better than Monte Carlo based methods in following a realistic folding pathway. Continuum simulations may be capable of modeling, for example, the phone cord effect on topology, while fragment assembly Monte Carlo, which allows chain crossings, may not. The challenge of searching a continuum requires a coarse-grained protein representation for maximum speed. However, coarse-grained forcefields are poorly suited to capture the orientation-dependence of backbone hydrogen-bonding and side-chain packing. Some of these inaccuracies can be corrected with orientation-dependent energetic terms that capture the anisotropic behavior of these interactions. For example, a knowledge-based forcefield has been calculated for orientation-dependent hydrogen bonds, parameterized on alpha-carbon positions only. A general procedure is outlined to find the derivatives for these special types of potentials.
De novo protein-protein interface design via beta-strand pairing
Ben Stranges, Kuhlman Lab, University of North Carolina Chapel Hill
One of the primary challenges in protein-protein interface design is satisfying hydrogen bond geometry at the interface. Many natural proteins use main-chain hydrogen bonds between strands to form an interface. Proteins with exposed beta-strands are thought to be prone to interact. Using an intermolecular beta-sheet as the basis of a protein-protein interface is an attractive method for de novo interface design because it satisfies many hydrogen bonds across the interface and provides an anchor for the design of the remainder of the complex. This work describes an approach to create intermolecular beta-sheet forming protein-protein interface. Thus far, computational methods using Rosetta have been unable to make a high affinity interaction, but directed evolution of the designed interface has yielded promising results.
Metal-seeded protein interface design
Bryan Der, Kuhlman Lab, University of North Carolina Chapel Hill
In designing a new protein interface, it is a major challenge to achieve tight binding between the target and scaffold. To promote tight binding, a metal coordination site will be incorporated at the interface � metal coordination is a strong interaction that occurs in nature for many purposes, including stabilization of protein structure. To imitate the tetrahedral coordination geometry observed in known structural zinc sites, we explore two approaches: a 3x1 approach designs 3 liganding residues at the scaffold surface and the fourth is provided by the target, and a 2x2 approach designs 2 liganding residues at the scaffold surface and the remaining 2 are provided by the target. In this way, metal coordination may provide an enthalpic boost to favor tight interaction of previously non-interacting proteins.
How camelid antibodies differ in sequence and structure?
Kayode Adeola Sanni and Aroop Sircar, GrayLab, Johns Hopkins University
Regular antibodies are composed of two identical heavy chains and two identical light chains. However, the serum found in the camelidae (bactrian and dromedary camels, llamas and alpacas) species contains a considerable fraction of heavy-chain antibodies (HCAbs), that lack the light chain. The heavy chains of the HCAbs is made up of three instead of four globular domains. The CH2 and CH3 constant domains are very similar to those found in classical antibodies; however HCAbs do not have the CH1 domain. Hence the camelid Fab is reduced to a single variable domain (VHH). VHHs are also referred to as nanobodies, and are also found in some kinds of shark, like the nurse shark. Although the VHH paratope is expected to be smaller than those found in regular IgGs comprising both the light and heavy chains, the VHH paratope is enlarged by the extension of the CDR H1 loop and longer CDR H3 loops (average 16 amino acids). Because of Nanobodies lack of the light chains, these structures are more soluble, easier to clone, more modular in nature, and easier to produce than the Regular Antibodies, also they bind to difficult-to-access antigens and cavities. For the past two months at John Hopkins, I have been researching and analyzing known camelid antibodies structure, and their sequences, in order to find rules that can help in modeling future antibodies to come.
On the Origin of Symmetry in Biological Macromolecules
Ingemar Andre, Charlie E.M. Strauss, David B.Kaplan, Phil Bradley, David Baker
Why are symmetric oligomers so common? Our hypothesis is that symmetrical complexes are favored not because symmetry itself provides specific functional advantages but because the initial pool of protein complexes available for competitive selection is heavily biased towards symmetrical binding modes. The probability of observing a complex with a given degree of symmetry decreases with increasing symmetry, but the variance in the monomer-monomer interaction energy increases with increasing symmetry. Evolutionary Selection retains beneficial functions. But to have any benefit a complex must be sufficiently populated. Our premise is that only primordial complexes with binding energy low enough to compensate for the (~10Kcal/mol) entropy barrier are function-competent candidates for further evolutionary optimization. Both simple analytic models and explicit protein-protein docking simulations show that these two counteracting factors leads to a predominance of symmetry in low energy binding modes. Thus the function-competent primordial sub-population available for evolutionary selection is overwhelmingly symmetric.