Feb
16
2010

Tradeoff between stability and multispecificity in the design of promiscuous proteins

Traditionally, computational protein design efforts have been directed at calculating a single sequence predicted to fold to a particular target structure. Recently, however, a number of conceptual generalizations have been pursued, ranging from the use of backbone flexibility, off-rotamer side chain flexibility, negative design, multi-body potentials, conformational free energy, and prediction of sequence profiles. Below I present our state-of-the-art research whose goal is to understand how protein sequences are optimized to be compatible with binding multiple partners with high affinity. – By Menachem Fromer.

In 2007, Elisabeth Humphris and Tanja Kortemme published a very intriguing paper in PLoS Computational Biology entitled Design of Multi-Specificity in Protein Interfaces.  They investigated a set of 65 protein-protein interfaces, for which each of 20 proteins was bound to 2 or more different partner proteins.  When analyzing these 20 proteins through the lens of protein design, what they found was 2 general classes of proteins:

(i) ‘‘multi-faceted’’ interfaces, where each partner exerted significant forces on only a subset of the protein’s residues, so that only accounting for all partners simultaneously better recovered the native protein sequence.

(ii) “shared” interfaces, where all binding partners have key energetic interactions with the same residues in the protein.

So, indeed they had found that multispecific binding must be accommodated at the sequence level.  But, we wanted some more details about how these adaptations for multiple states may occur.  For example, as we gradually require a protein to perform additional interactions, are the changes it undergoes gradual, or by abrupt destabilizations followed by only minimal effect on the proteins?  Clearly, the answers to such questions would shed light on both naturally occurring evolution and directed evolution of protein sequences.

Along with my advisor, Michal Linial, we somewhat touched upon these questions in our recent paper in Proteins:  Structure, Function, and Bioinformatics entitled Design of multispecific protein sequences using probabilistic graphical modeling.  There, we developed a novel computational framework, which employs probabilistic graphical models and belief propagation (BP), to efficiently and accurately design proteins that partake in numerous protein-protein interactions.  For some small-scale cases taken from the Humphris and Kortemme paper, we found that there are multiple ways in which “sequence compromises” can be achieved when optimizing a protein to bind multiple partners.  For example, when designing protein X to bind protein Y, protein X may prefer serine (S) at a particular interface position; when designing X to bind protein Z, it may prefer isoleucine (I) at that position.  But when requiring that X bind both Y and Z, it may be the case that X prefers an amino acid different than and I and S, e.g., glutamine (Q).  This was in fact the scenario we found to be true for the Thioredoxin test case in our paper (position 75).

To take this one step further, Julia Shifman and I collaborated on doing some large-scale simulations for a single multi-binder: calmodulin (CaM).  Our paper entitled: Tradeoff between stability and multispecificity in the design of promiscuous proteins was recently published in PLoS Computational Biology.

CaM is a promiscuous protein in the cell that mediates multiple functions by selectively binding to 100s of protein partners.  Among these partners, there existed a solved structure (X-ray or NMR) for CaM’s interaction with 16 of these partners, for which all partners bind in virtually the same core section of CaM:

16 structures of CaM bound to different partners

16 structures of CaM bound to different partners

We then optimally designed the CaM interface (using probabilistic graphical modeling, DEE, and Monte Carlo simulated annealing) to be capable of binding each of the 16 partners, each pair of partners, each subset of 3 partners, and all of the 16 partners:

Multi state design strategy

Multi state design strategy

For each such design scenario, our goal was to find a set of 100 sequences that are optimal for binding the target subset of partner peptides. Interestingly, what we found is that all sorts of sequence compromises that are imaginable can and do occur when requiring CaM to bind two different partners, in this case the cardiac Ca(v)1.2 calcium channel IQ domain (PDB code 2F3Y) and R-type Ca(v)2.3 calcium channel IQ domain (PDB code 3BXL).  We observed 5 major categories of compromise:

  1. Both individual states have similar profiles and the 2-state design chooses this profile.
  2. 2-state design yielded a profile that is a combination of those obtained for each single-state design.
  3. 2-state design yielded a distribution of amino acids that was similar to that of only one of the single-state designs.
  4. The amino acid profile for the 2-state design is different from that of both of the individual single-state designs.
  5. Despite the individual states having similar profiles, the 2-state profile is different.
Types of compromise strategies

Types of compromise strategies

However, as expected, some of these phenomena are more frequent than others.  For example, while the 1st scenario is most common at many of the interface positions, quite frequently the compromise required for multispecific binding permits only one of the partner’s preferences win out (the 3rd scenario).

Frequencies of compromise strategies

Frequencies of compromise strategies

Now, you may ask: But do these compromises really reflect the underlying biology, or perhaps they are an artifact of the modeling or of the energy function used (in this case, the ORBIT energy function)?  The answer is that these compromises, on the whole, do seem to better explain the biology than optimizing the protein sequence for binding each partner separately:

Distribution of the number of predicted mutations

Distribution of the number of predicted mutations

Thus we found that, on average, as more CaM partners are incorporated into the CaM interface design process, the CaM sequence becomes more similar to the wild-type sequence.  Perhaps unsurprisingly, this seems to happen monotonically, in that the average mutation rate for the designed sequences always decreases as additional partners are considered.  However, one of the interesting patterns observed is that this process is not gradual.  In fact, the decrease in binding optimality when going from 1 to 2 partners (A ? A+B) is significantly greater than when going from 2 to 3 partners (A+B ? A+B+C):

Finding the first "other" partner is harder than the next one

Finding the first "other" partner is harder than the next one

This result, which for now has been observed only for CaM and 16 of its partners (out of hundreds), may or may not hold true for other promiscuous proteins.  More research is needed to definitively conclude if this is an intrinsic quality of such multi-binders designed by evolution. Nevertheless, we expect this result to be at least somewhat applicable, depending on the similarity among the various binding partners and their affinities to the protein in question.

Finally, for those of you interested in using the probabilistic graphical modeling-based (multispecific) protein design program that I wrote (SPRINT), it should be released in the next few months at: http://www.protonet.cs.huji.ac.il/sprint


Menachem is finishing up his PhD studies in the labs of Michal Linial and Nati Linial at The Hebrew University of Jerusalem. His main research interests have involved the modeling of protein structures using probabilistic models and protein sequences using agglomerative evolutionary models.

Humphris EL, & Kortemme T (2007). Design of multi-specificity in protein interfaces. PLoS computational biology, 3 (8) PMID: 17722975
Fromer M, Yanover C, & Linial M (2010). Design of multispecific protein sequences using probabilistic graphical modeling. Proteins, 78 (3), 530-47 PMID: 19842166
Fromer M, & Shifman JM (2009). Tradeoff between stability and multispecificity in the design of promiscuous proteins. PLoS computational biology, 5 (12) PMID: 20041208


Related posts

Enjoyed this Post ?

Subscribe by E-mail:

Subscribe in a reader. Follow us on twitter.

Powered by WordPress | Aeros Theme | TheBuckmaker.com WordPress Themes
© 2009 Rosetta Design Group LLC