Difference between revisions of "Protein protein interaction prediction"
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
− | <p><strong>Protein-protein interaction prediction</strong> is a field combining bioinformatics and structural biology in an attempt to identify and catalog interactions between pairs or groups of proteins. Understanding | + | <p><font color="#000000"><strong>Protein-protein interaction prediction</strong> is a field combining bioinformatics and structural biology in an attempt to identify and catalog interactions between pairs or groups of proteins. Understanding protein-protein interactions is important in investigating intracellular signaling pathways. Experimentally, interactions between pairs of proteins are inferred from yeast two-hybrid systems, from affinity purification/mass spectrometry assays, or from protein microarrays. In parallel to the experimental determination of the interactome, computational methods are being developed.</font></p> |
− | < | + | <p><font color="#000000"> </font></p> |
− | + | <h2><span class="mw-headline"><font color="#000000">Methods</font></span></h2> | |
− | + | <p><font color="#000000">Proteins that interact are more likely to co-evolve, therefore it is possible to make inferences about interactions between pairs of proteins based on their phylogenetic distances. It has also been observed in some cases that pairs of interacting proteins have fused orthologues in other organisms. In addition, a number of bound protein complexes have been structurally solved and can be used to identify the residues that mediate the interaction so that similar motifs can be located in other organisms.</font></p> | |
− | + | <p><font color="#000000"> </font></p> | |
− | </ | + | <h3><span class="mw-headline"><font color="#000000">Phylogenetic profiling</font></span></h3> |
− | + | <p><font color="#000000">This method involves using a sequence search tool such as BLAST for finding homologues of a pair of proteins, then building multiple sequence alignments with alignment tools such as Clustal. From these multiple sequence alignments, phylogenetic distance matrices are calculated for each protein in the hypothesized interacting pair. If the matrices are sufficiently similar (as measured by their Pearson correlation coefficient) they are deemed likely to interact.</font></p> | |
− | <h2><span class="mw-headline">Methods</span></h2> | + | <p><font color="#000000"> </font></p> |
− | <p>Proteins that interact are more likely to co-evolve, therefore it is possible to make inferences about interactions between pairs of proteins based on their phylogenetic distances. It has also been observed in some cases that pairs of interacting proteins have fused orthologues in other organisms. In addition, a number of bound protein complexes have been structurally solved and can be used to identify the residues that mediate the interaction so that similar motifs can be located in other organisms.</p> | + | <h3><span class="mw-headline"><font color="#000000">Identification of homologous interacting pairs</font></span></h3> |
− | <p> </p> | + | <p><font color="#000000">This method consists of searching whether the two sequences have homologues which form a complex in a database of known structures of complexes. The identification of the domains is done by sequence searches against domain databases such as Pfam using BLAST. If more than one complex of Pfam domains is identified, then the query sequences are aligned using a hidden Markov tool called HMMer to the closest identified homologues, whose structures are known. Then the alignments are analysed to check whether the contact residues of the known complex are conserved in the alignment.</font></p> |
− | <h3><span class="mw-headline">Phylogenetic profiling</span></h3> | + | <p><font color="#000000"> </font></p> |
− | <p>This method involves using a sequence search tool such as BLAST for finding homologues of a pair of proteins, then building multiple sequence alignments with alignment tools such as Clustal. From these multiple sequence alignments, phylogenetic distance matrices are calculated for each protein in the hypothesized interacting pair. If the matrices are sufficiently similar (as measured by their Pearson correlation coefficient) they are deemed likely to interact.</p> | + | <h3><span class="mw-headline"><font color="#000000">Identification of structural patterns</font></span></h3> |
− | <p> </p> | + | <p><font color="#000000">A third method builds a library of known protein-protein interfaces from the PDB, where the interfaces are defined as pairs of polypeptide fragments that are below a threshold slightly larger than the Van der Waals radius of the atoms involved. The sequences in the library are then clustered based on structural alignment and redundant sequences are eliminated. The residues that have a high (generally >50%) level of frequency for a given position are considered hotspots. This library is then used to identify potential interactions between pairs of targets, providing that they have a known structure (i.e. present in the PDB).</font></p> |
− | <h3><span class="mw-headline">Identification of homologous interacting pairs</span></h3> | + | <p><font color="#000000"> </font></p> |
− | <p>This method consists of searching whether the two sequences have homologues which form a complex in a database of known structures of complexes. The identification of the domains is done by sequence searches against domain databases such as Pfam using BLAST. If more than one complex of Pfam domains is identified, then the query sequences are aligned using a hidden Markov tool called HMMer to the closest identified homologues, whose structures are known. Then the alignments are analysed to check whether the contact residues of the known complex are conserved in the alignment.</p> | + | <h3><span class="mw-headline"><font color="#000000">Bayesian network modelling</font></span></h3> |
− | <p> </p> | + | <p><font color="#000000">Bayesian methods integrate data from a wide variety of sources, including both experimental results and prior computational predictions, and use these features to assess the likelihood that a particular potential protein interaction is a true positive result. These methods are useful because experimental procedures, particularly the yeast two-hybrid experiments, are extremely noisy and produce many false positives, while the previously mentioned computational methods can only provide circumstantial evidence that a particular pair of proteins might interact.</font></p> |
− | <h3><span class="mw-headline">Identification of structural patterns</span></h3> | + | <p><font color="#000000"> </font></p> |
− | <p>A third method builds a library of known protein-protein interfaces from the PDB, where the interfaces are defined as pairs of polypeptide fragments that are below a threshold slightly larger than the Van der Waals radius of the atoms involved. The sequences in the library are then clustered based on structural alignment and redundant sequences are eliminated. The residues that have a high (generally >50%) level of frequency for a given position are considered hotspots. This library is then used to identify potential interactions between pairs of targets, providing that they have a known structure (i.e. present in the PDB).</p> | + | <h2><span class="mw-headline"><font color="#000000">Relationship to docking methods</font></span></h2> |
− | <p> </p> | + | <p><font color="#000000">The field of protein-protein interaction prediction is closely related to the field of protein-protein docking, which attempts to use geometric and steric considerations to fit two proteins of known structure into a bound complex. This is a useful mode of inquiry in cases where both proteins in the pair have known structures and are known (or at least strongly suspected) to interact, but since so many proteins do not have experimentally determined structures, sequence-based interaction prediction methods are especially useful in conjunction with experimental studies of an organism's interactome.</font></p> |
− | <h3><span class="mw-headline">Bayesian network modelling</span></h3> | + | <p><font color="#000000"> </font></p> |
− | <p>Bayesian methods integrate data from a wide variety of sources, including both experimental results and prior computational predictions, and use these features to assess the likelihood that a particular potential protein interaction is a true positive result. These methods are useful because experimental procedures, particularly the yeast two-hybrid experiments, are extremely noisy and produce many false positives, while the previously mentioned computational methods can only provide circumstantial evidence that a particular pair of proteins might interact.</p> | + | <h2><span class="mw-headline"><font color="#000000">See also</font></span></h2> |
− | <p> </p> | + | <p><font color="#000000">Protein-protein docking</font></p> |
− | <h2><span class="mw-headline">Relationship to docking methods</span></h2> | + | <p><font color="#000000">Two hybrid screening</font></p> |
− | <p>The field of protein-protein interaction prediction is closely related to the field of protein-protein docking, which attempts to use geometric and steric considerations to fit two proteins of known structure into a bound complex. This is a useful mode of inquiry in cases where both proteins in the pair have known structures and are known (or at least strongly suspected) to interact, but since so many proteins do not have experimentally determined structures, sequence-based interaction prediction methods are especially useful in conjunction with experimental studies of an organism's interactome.</p> | + | <p><font color="#000000">Protein-DNA interaction site predictor</font></p> |
− | <p> </p> | + | <p><font color="#000000"> </font></p> |
− | <h2><span class="mw-headline">See also</span></h2> | + | <h2><span class="mw-headline"><font color="#000000">Servers</font></span></h2> |
− | <p>Protein-protein docking</p> | ||
− | <p>Two hybrid screening</p> | ||
− | <p>Protein-DNA interaction site predictor</p> | ||
− | <p> </p> | ||
− | <h2><span class="mw-headline">Servers</span></h2> | ||
<ul> | <ul> | ||
− | <li>InterProSurf </li> | + | <li><font color="#000000">InterProSurf </font></li> |
− | <li>ADVICE </li> | + | <li><font color="#000000">ADVICE </font></li> |
− | <li>FastContact </li> | + | <li><font color="#000000">FastContact </font></li> |
− | <li>InterPreTS </li> | + | <li><font color="#000000">InterPreTS </font></li> |
− | <li>PRISM </li> | + | <li><font color="#000000">PRISM </font></li> |
− | <li>PIP </li> | + | <li><font color="#000000">PIP </font></li> |
− | <li>SPPIDER </li> | + | <li><font color="#000000">SPPIDER </font></li> |
− | <li>cons-PPISP </li> | + | <li><font color="#000000">cons-PPISP </font></li> |
</ul> | </ul> | ||
− | <p> </p> | + | <p><font color="#000000"> </font></p> |
− | <h2><span class="mw-headline">References</span></h2> | + | <h2><span class="mw-headline"><font color="#000000">References</font></span></h2> |
<ol> | <ol> | ||
− | <li><cite id="endnote_Dandekar" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Dandekar"><strong>^</strong></a></cite> Dandekar T., Snel B.,Huynen M. and Bork P. (1998) "Conservation of gene order: a fingerprint of proteins that physically interact." <em>Trends Biochem. Sci.</em> (23),324-328 </li> | + | <li><cite id="endnote_Dandekar" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Dandekar"><strong><font color="#000000">^</font></strong></a></cite><font color="#000000"> Dandekar T., Snel B.,Huynen M. and Bork P. (1998) "Conservation of gene order: a fingerprint of proteins that physically interact." <em>Trends Biochem. Sci.</em> (23),324-328 </font></li> |
− | <li><cite id="endnote_Enright" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Enright"><strong>^</strong></a></cite> Enright A.J.,Iliopoulos I.,Kyripides N.C. and Ouzounis C.A. (1999) "Protein interaction maps for complete genomes based on gene fusion events." <em>Nature</em> (402), 86-90 </li> | + | <li><cite id="endnote_Enright" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Enright"><strong><font color="#000000">^</font></strong></a></cite><font color="#000000"> Enright A.J.,Iliopoulos I.,Kyripides N.C. and Ouzounis C.A. (1999) "Protein interaction maps for complete genomes based on gene fusion events." <em>Nature</em> (402), 86-90 </font></li> |
− | <li><cite id="endnote_Marcotte" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Marcotte"><strong>^</strong></a></cite> Marcotte E.M., Pellegrini M., Ng H.L., Rice D.W., Yeates T.O., Eisenberg D. (1999) "Detecting protein function and protein-protein interactions from genome sequences." <em>Science</em> (285), 751-753 </li> | + | <li><cite id="endnote_Marcotte" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Marcotte"><strong><font color="#000000">^</font></strong></a></cite><font color="#000000"> Marcotte E.M., Pellegrini M., Ng H.L., Rice D.W., Yeates T.O., Eisenberg D. (1999) "Detecting protein function and protein-protein interactions from genome sequences." <em>Science</em> (285), 751-753 </font></li> |
− | <li><cite id="endnote_Pazos" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Pazos"><strong>^</strong></a></cite> Pazos F., Valencia A. (2001). "Similarity of phylogenetic trees as indicator of protein-protein interaction." <em>Protein Engineering</em>, <strong>9</strong> (14), 609-614 </li> | + | <li><cite id="endnote_Pazos" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Pazos"><strong><font color="#000000">^</font></strong></a></cite><font color="#000000"> Pazos F., Valencia A. (2001). "Similarity of phylogenetic trees as indicator of protein-protein interaction." <em>Protein Engineering</em>, <strong>9</strong> (14), 609-614 </font></li> |
− | <li><cite id="endnote_Tan" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Tan"><strong>^</strong></a></cite> Tan S.H., Zhang Z., Ng S.K. (2004) "ADVICE: Automated Detection and Validation of Interaction by Co-Evolution." <em>Nucl. Ac. Res.</em>, <strong>32</strong> (Web Server issue):W69-72. </li> | + | <li><cite id="endnote_Tan" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Tan"><strong><font color="#000000">^</font></strong></a></cite><font color="#000000"> Tan S.H., Zhang Z., Ng S.K. (2004) "ADVICE: Automated Detection and Validation of Interaction by Co-Evolution." <em>Nucl. Ac. Res.</em>, <strong>32</strong> (Web Server issue):W69-72. </font></li> |
− | <li><cite id="endnote_Aytuna" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Aytuna"><strong>^</strong></a></cite> Aytuna A. S., Keskin O., Gursoy A. (2005) "Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces." <em>Bioinformatics</em>, <strong>21</strong> (12), 2850-2855 </li> | + | <li><cite id="endnote_Aytuna" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Aytuna"><strong><font color="#000000">^</font></strong></a></cite><font color="#000000"> Aytuna A. S., Keskin O., Gursoy A. (2005) "Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces." <em>Bioinformatics</em>, <strong>21</strong> (12), 2850-2855 </font></li> |
− | <li><cite id="endnote_Ogmen" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Ogmen"><strong>^</strong></a></cite> Ogmen U., Keskin O., Aytuna A.S., Nussinov R. and Gursoy A. (2005) "PRISM: protein interactions by structural matching." <em>Nucl. Ac. Res.</em>,<strong>33</strong> (Web Server issue):W331-336 </li> | + | <li><cite id="endnote_Ogmen" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Ogmen"><strong><font color="#000000">^</font></strong></a></cite><font color="#000000"> Ogmen U., Keskin O., Aytuna A.S., Nussinov R. and Gursoy A. (2005) "PRISM: protein interactions by structural matching." <em>Nucl. Ac. Res.</em>,<strong>33</strong> (Web Server issue):W331-336 </font></li> |
− | <li><cite id="endnote_Keskin" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Keskin"><strong>^</strong></a></cite> Keskin O., Ma B. and Nussinov R. (2004) "Hot regions int protein-protein interactions: The organization and contribution of structurally conserved hot spot residues" <em>J. Mol. Biol.</em>, (345),1281-1294 </li> | + | <li><cite id="endnote_Keskin" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Keskin"><strong><font color="#000000">^</font></strong></a></cite><font color="#000000"> Keskin O., Ma B. and Nussinov R. (2004) "Hot regions int protein-protein interactions: The organization and contribution of structurally conserved hot spot residues" <em>J. Mol. Biol.</em>, (345),1281-1294 </font></li> |
− | <li><cite id="endnote_Jansen" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Jansen"><strong>^</strong></a></cite> Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M. (2003) A Bayesian networks approach for predicting protein-protein interactions from genomic data." <em>Science</em>, 302(5644):449-53. </li> | + | <li><cite id="endnote_Jansen" style="FONT-STYLE: normal"><a title="" href="http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction#ref_Jansen"><strong><font color="#000000">^</font></strong></a></cite><font color="#000000"> Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M. (2003) A Bayesian networks approach for predicting protein-protein interactions from genomic data." <em>Science</em>, 302(5644):449-53. </font></li> |
</ol> | </ol> | ||
+ | <p> </p> |
Latest revision as of 20:34, 11 January 2008
Protein-protein interaction prediction is a field combining bioinformatics and structural biology in an attempt to identify and catalog interactions between pairs or groups of proteins. Understanding protein-protein interactions is important in investigating intracellular signaling pathways. Experimentally, interactions between pairs of proteins are inferred from yeast two-hybrid systems, from affinity purification/mass spectrometry assays, or from protein microarrays. In parallel to the experimental determination of the interactome, computational methods are being developed.
Contents
Methods
Proteins that interact are more likely to co-evolve, therefore it is possible to make inferences about interactions between pairs of proteins based on their phylogenetic distances. It has also been observed in some cases that pairs of interacting proteins have fused orthologues in other organisms. In addition, a number of bound protein complexes have been structurally solved and can be used to identify the residues that mediate the interaction so that similar motifs can be located in other organisms.
Phylogenetic profiling
This method involves using a sequence search tool such as BLAST for finding homologues of a pair of proteins, then building multiple sequence alignments with alignment tools such as Clustal. From these multiple sequence alignments, phylogenetic distance matrices are calculated for each protein in the hypothesized interacting pair. If the matrices are sufficiently similar (as measured by their Pearson correlation coefficient) they are deemed likely to interact.
Identification of homologous interacting pairs
This method consists of searching whether the two sequences have homologues which form a complex in a database of known structures of complexes. The identification of the domains is done by sequence searches against domain databases such as Pfam using BLAST. If more than one complex of Pfam domains is identified, then the query sequences are aligned using a hidden Markov tool called HMMer to the closest identified homologues, whose structures are known. Then the alignments are analysed to check whether the contact residues of the known complex are conserved in the alignment.
Identification of structural patterns
A third method builds a library of known protein-protein interfaces from the PDB, where the interfaces are defined as pairs of polypeptide fragments that are below a threshold slightly larger than the Van der Waals radius of the atoms involved. The sequences in the library are then clustered based on structural alignment and redundant sequences are eliminated. The residues that have a high (generally >50%) level of frequency for a given position are considered hotspots. This library is then used to identify potential interactions between pairs of targets, providing that they have a known structure (i.e. present in the PDB).
Bayesian network modelling
Bayesian methods integrate data from a wide variety of sources, including both experimental results and prior computational predictions, and use these features to assess the likelihood that a particular potential protein interaction is a true positive result. These methods are useful because experimental procedures, particularly the yeast two-hybrid experiments, are extremely noisy and produce many false positives, while the previously mentioned computational methods can only provide circumstantial evidence that a particular pair of proteins might interact.
Relationship to docking methods
The field of protein-protein interaction prediction is closely related to the field of protein-protein docking, which attempts to use geometric and steric considerations to fit two proteins of known structure into a bound complex. This is a useful mode of inquiry in cases where both proteins in the pair have known structures and are known (or at least strongly suspected) to interact, but since so many proteins do not have experimentally determined structures, sequence-based interaction prediction methods are especially useful in conjunction with experimental studies of an organism's interactome.
See also
Protein-protein docking
Two hybrid screening
Protein-DNA interaction site predictor
Servers
- InterProSurf
- ADVICE
- FastContact
- InterPreTS
- PRISM
- PIP
- SPPIDER
- cons-PPISP
References
- ^ Dandekar T., Snel B.,Huynen M. and Bork P. (1998) "Conservation of gene order: a fingerprint of proteins that physically interact." Trends Biochem. Sci. (23),324-328
- ^ Enright A.J.,Iliopoulos I.,Kyripides N.C. and Ouzounis C.A. (1999) "Protein interaction maps for complete genomes based on gene fusion events." Nature (402), 86-90
- ^ Marcotte E.M., Pellegrini M., Ng H.L., Rice D.W., Yeates T.O., Eisenberg D. (1999) "Detecting protein function and protein-protein interactions from genome sequences." Science (285), 751-753
- ^ Pazos F., Valencia A. (2001). "Similarity of phylogenetic trees as indicator of protein-protein interaction." Protein Engineering, 9 (14), 609-614
- ^ Tan S.H., Zhang Z., Ng S.K. (2004) "ADVICE: Automated Detection and Validation of Interaction by Co-Evolution." Nucl. Ac. Res., 32 (Web Server issue):W69-72.
- ^ Aytuna A. S., Keskin O., Gursoy A. (2005) "Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces." Bioinformatics, 21 (12), 2850-2855
- ^ Ogmen U., Keskin O., Aytuna A.S., Nussinov R. and Gursoy A. (2005) "PRISM: protein interactions by structural matching." Nucl. Ac. Res.,33 (Web Server issue):W331-336
- ^ Keskin O., Ma B. and Nussinov R. (2004) "Hot regions int protein-protein interactions: The organization and contribution of structurally conserved hot spot residues" J. Mol. Biol., (345),1281-1294
- ^ Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M. (2003) A Bayesian networks approach for predicting protein-protein interactions from genomic data." Science, 302(5644):449-53.