Proteomics
Proteomics is the omics study of proteins, particularly their structures, sequences, and functions. Proteins are vital parts of living organisms, as they are the main components of the physiological metabolic pathways of cells. The term "proteomics" was coined to make an analogy with genomics, the study of the genes. The word "proteome" is a portmanteau of "protein" and "genome". The proteome of an organism is the set of proteins produced by it during its life, and its genome is its set of genes.
Proteomics is often considered the next step in the study of biological systems, after genomics. It is much more complicated than genomics, mostly because while an organism's genome is rather constant, a proteome differs from cell to cell and constantly changes through its biochemical interactions with the genome and the environment. One organism has radically different protein expression in different parts of its body, different stages of its life cycle and different environmental conditions. Another major difficulty is the complexity of proteins relative to nucleic acids. E.g., in human there are about 25 000 identified genes but an estimated >500 000 proteins that are derived from these genes. This increased complexity derives from mechanisms such as alternative splicing, protein modification (glycosylation,phosphorylation) and protein degradation.
Scientists are very interested in proteomics because it gives a much better understanding of an organism than genomics. First, the level of transcription of a gene gives only a rough estimate of its level of expression into a protein. An mRNA produced in abundance may be degraded rapidly or translated inefficiently, resulting in a small amount of protein. Second, many proteins experience post-translational modifications that profoundly affect their activities; for example some proteins are not active until they become phosphorylated. Methods such as phosphoproteomics and glycoproteomics are used to study post-translational modifications. Third, many transcripts give rise to more than one protein, through alternative splicing or alternative post-translational modifications. Finally, many proteins form complexes with other proteins or RNA molecules, and only function in the presence of these other molecules.
Since proteins play a central role in the life of an organism, proteomics is instrumental in discovery of biomarkers, such as markers that indicate a particular disease.
With the completion of a rough draft of the human genome, many researchers are looking at how genes and proteins interact to form other proteins. A surprising finding of the Human Genome Project is that there are far fewer protein-coding genes in the human genome than proteins in the human proteome (20,000 to 25,000 genes vs. > 500,000 proteins). The human body may even contain more than 2 million proteins, each having different functions. The protein diversity is thought to be due to alternative splicing and post-translational modification of proteins. The discrepancy implies that protein diversity cannot be fully characterized by gene expression analysis, thus proteomics is useful for characterizing cells and tissues.
To catalog all human proteins, their functions and interactions is a great challenge for scientists. An international collaboration with these goals is co-ordinated by the Human Proteome Organization (HUPO).
Contents
Studying proteomics
Most proteins function in collaboration with other proteins, and one goal of proteomics is to identify which proteins interact. This often gives important clues about the functions of newly discovered proteins. Several methods are available to probe protein-protein interactions. The traditional method is yeast two-hybrid analysis. New methods include protein microarrays, immunoaffinity chromatography followed by mass spectrometry, and combinations of experimental methods such as phage display and computational methods.
Current research in proteomics requires first that proteins be resolved, sometimes on a massive scale. Protein separation can be performed using two-dimensional gel electrophoresis, which usually separates proteins first by isoelectric point and then by molecular weight. Protein spots in a gel can be visualized using a variety of chemical stains or fluorescent markers. Proteins can often be quantified by the intensity of their stain. Once proteins are separated and quantified, they are identified, usually by in-gel digestion and subsequent mass spectrometry. For the in-gel digestion, individual spots are cut out of the gel and cleaved into peptides with proteolytic enzymes. These peptides are used for the identification of the protein by peptide mass fingerprinting or de novo sequencing.
The peptide mass fingerprint relies on the specific pattern of peptide signals for a given protein in mass spectrometry, most often MALDI-TOF mass spectrometry. The pattern obtained in mass spectrometry is compared with database entries for the identification of the protein.
Denovo sequencing of proteins uses electrospray ionization tandem mass spectrometry. The first stage of tandem MS/MS isolates individual peptide ions, and the second breaks the peptides into fragments and uses the fragmentation pattern to determine their amino acid sequences. The sequences are used for a database search which gives satisfying results even with proteins not listed in the database by sequence similarity.
Protein mixtures can also be analyzed without prior separation. These procedures begin with proteolytic digestion of the proteins in a complex mixture. The resulting peptides are often injected onto a high pressure liquid chromatography column (HPLC) that separates peptides based on hydrophobicity. HPLC can be coupled directly to a time-of-flight mass spectrometer using electrospray ionization. Peptides eluting from the column can be identified by tandem mass spectrometry (MS/MS). Labeling with isotope tags can be used to quantitatively compare proteins concentration among two or more protein samples.
One of the most promising developments to come from the study of human genes and proteins has been the identification of potential new drugs for the treatment of disease. This relies on genome and proteome information to identify proteins associated with a disease, which computer software can then use as targets for new drugs. For example, if a certain protein is implicated in a disease, its 3D structure provides the information to design drugs to interfere with the action of the protein. A molecule that fits the active site of an enzyme, but cannot be released by the enzyme, will inactivate the enzyme. This is the basis of new drug-discovery tools, which aim to find new drugs to inactivate proteins involved in disease. As genetic differences among individuals are found, researchers expect to use these techniques to develop personalized drugs that are more effective for the individual.
A computer technique which attempts to fit millions of small molecules to the three-dimensional structure of a protein is called "virtual ligand screening". The computer rates the quality of the fit to various sites in the protein, with the goal of either enhancing or disabling the function of the protein, depending on its function in the cell. A good example of this is the identification of new drugs to target and inactivate the HIV-1 protease. The HIV-1 protease is an enzyme that cleaves a very large HIV protein into smaller, functional proteins. The virus cannot survive without this enzyme; therefore, it is one of the most effective protein targets for killing HIV.
There are many distributed computing programs, such as the world community grid, which allows people around the world to help scientists by computing calculations. The software adds to the use of super computers by using the unused processing power of millions of home computers. The world community grid works on HIV, cancer, and protein folding. All three projects centre around protein modelling and protein modification models. Using the data gained from distributed computing models of proteins, scientists can develop more specific and effective therapies. In addition, most enzymes act as part of complexes and networks, which also affect the way an enzyme acts in a cell. Understanding these complex networks will assist in developing drugs that affect the function of these complexes.
Biomarkers
Understanding the proteome, the structure and function of each protein and the complexities of protein-protein interactions will be critical for developing the most effective diagnostic techniques and disease treatments in the future.
An interesting use of proteomics is using specific protein biomarkers to diagnose disease. A number of techniques allow to test for proteins produced during a particular disease, which helps to diagnose the disease quickly. Techniques include western blot, immunohistochemical staining, enzyme linked immunosorbent assay (ELISA) or mass spectrometry. The following are some of the diseases that have characteristic biomarkers that physicians can use for diagnosis:
- In Alzheimer’s disease, elevations in beta secretase creates amyloid/beta-protein, which causes plaque to build up in the patient's brain, which causes dementia. Targeting this enzyme decreases the amyloid/beta-protein and so slows the progression of the disease. A procedure to test for the increase in amyloid/beta-protein is immunohistochemical staining, in which antibodies bind to specific antigens or biological tissue of amyloid/beta-protein.
- Heart disease is commonly assessed using several key protein based biomarkers. Standard protein biomarkers for CVD include interleukin-6, interleukin-8, serum amyloid A protein, fibrinogen, and troponins. cTnI cardiac troponin I increases in concentration within 3 to 12 hours of initial cardiac injury and can be found elevated days after an acute myocardial infarction. A number of commercial antibody based assays as well as other methods are used in hospitals as primary tests for acute MI.
- Proteomic analysis of kidney cells and cancerous kidney cells is producing promising leads for biomarkers for renal cell carcinoma and developing assays to test for this disease. In kidney-related diseases, urine is a potential source for such biomarkers. Recently, it has been shown that the identification of urinary polypeptides as biomarkers of kidney-related diseases allows to diagnose the severity of the disease several months before the appearance of the pathology.Article
Branches
- Protein separation. Proteomic technologies rely on the ability to separate a complex mixture so that individual proteins are more easily processed with other techniques.
- Protein identification. Well-known methods include low-throughput sequencing through Edman degradation. Higher-throughput proteomic techniques are based on mass spectrometry, commonly peptide mass fingerprinting on MALDI-TOF instruments, or De novo repeat detection MS/MS on instruments capable of more than one round of mass spectrometry. MS/MS data can be analyzed by simple database searches as is the case for PMFs and additionally they can be analyzed by de novo sequencing and homology searching. This particular approach allows to even identify similar (homolog) proteins, e.g. across species in case a protein was derived from an organism with unsequenced genome. Antibody-based assays can also be used, but are unique to one sequence motif.
- In quantitative proteomics different methods are used to obtain quantitative information on a proteome-wide scale. Rather than just lists of proteins, quantitative proteomics provides functional information and reveals temporal changes in the proteome.
- Protein sequence analysis is a branch of bioinformatics that deals with searching databases for possible protein or peptide matches by algorithms such as Mascot, PEAKS(software), OMSSA, SEQUEST and X!Tandem, PWB Protein Identification Cluster Software Solution, functional assignment of domains, prediction of function from sequence, and evolutionary relationships of proteins.
- Structural proteomics concerns the high-throughput determination of protein structures in three-dimensional space. Common methods are x-ray crystallography and NMR spectroscopy.
- Interaction proteomics concerns the investigation of protein interactions on the atomic, molecular and cellular levels. see related article on Protein-protein interaction prediction.
- Protein modification studies the modified forms of proteins. Almost all proteins are modified from their pure translated amino-acid sequence, by so-called post-translational modification. Specialized methods have been developed to study phosporylation (phosphoproteomics) and glycosylation (glycoproteomics).
- Cellular proteomics is a new branch of proteomics aiming to map the location of proteins and protein-protein interactions in whole cells during key cell events. Centers around the use of techniques such as X-ray Tomography and optical fluorescence microscopy.
- Experimental bioinformatics is a branch of bioinformatics, as it is applied in proteomics, coined by Mathias Mann. It involves the mutual design of experimental and bioinformatics methods to create (extract) new types of information from proteomics experiments.
Technologies
Proteomics uses various technologies:
- One- and two-dimensional gel electrophoresis is used to identify the relative mass of a protein and its isoelectric point.
- X-ray crystallography and nuclear magnetic resonance are used to characterize the three-dimensional structure of peptides and proteins. However, low-resolution techniques such as circular dichroism, Fourier transform infrared spectroscopy and Small angle X-ray scattering (SAXS) can be used to study the secondary structure of proteins.
- Tandem mass spectrometry combined with reverse phase chromatography or 2-D electrophoresis is used to identify proteins using database search tools such as Mascot, Phenyx, PEAKS(software), OMSSA, X!Tandem and SEQUEST or de novo algorithms and quantify all the levels of proteins found in cells.
- Scaffold, a software tool useful in visualization of tandem mass spectrometry results.
- Tandem mass spectrometry combined with tagging technologies such as TMT, ICPL or iTRAQ is used for quantification of proteins and peptides.
- Mass spectrometry (no-tandem), often MALDI-TOF, is used to identify proteins by peptide mass fingerprinting. This technology is also used in so-called "MALDI-TOF MS protein profiling" where samples (i.e. serum) are prepared by either protein chips (SELDI-TOF MS), magnetic beads (The Bruker Daltonics protein profiling platform) or with other methods of sample treatment, such as liquid chromatography, size-exclusion and immunoaffinity. Protein peaks of interest must be identified by tandem mass spectrometry. Protein profiling with MALDI-TOF MS could be of high use in clinical diagnostics, but so far there has been little success with advancing MALDI-TOF MS protein profiling into clinical validation due to high analytical variation.
- ICP-MS combined with MeCAT - Metal Coded Tagging - technology is used for ultrasensitive quantification of proteins and peptides down to low attomol range
- Affinity chromatography, yeast two hybrid techniques, fluorescence resonance energy transfer (FRET), and Surface Plasmon Resonance (SPR) are used to identify protein-protein and protein-DNA binding reactions.
- X-ray Tomography used to determine the location of labeled proteins or protein complexes in an intact cell. Frequently correlated with images of cells from light based microscopes.
- Software based image analysis is utilized to automate the quantification and detection of spots within and among gels samples. While this technology is widely utilized, the intelligence has not been perfected yet. For example, the leading software tools in this area tend to agree on the analysis of well-defined, well-separated protein spots, but they deliver different results and tendencies with less-defined less-separated spots - thus necessitating manual verification of results.
See also
- proteomic chemistry
- bioinformatics
- cytomics
- genomics
- List of omics topics in biology
- metabolomics
- lipidomics
- Shotgun proteomics
- Top-down proteomics
- Bottom-up proteomics
- systems biology
- transcriptomics
- phosphoproteomics
- PEGylation
Protein databases
- UniProt
- PIR
- Swiss-Prot
- PDB
- NCBI
- Human Protein Reference Database
Proteomics Tools
- Protein Information Crawler (PIC) - Software for extensive spidering of multiple protein information resources for large protein sets
References
- ^ Anderson NL, Anderson NG (1998). "Proteome and proteomics: new technologies, new concepts, and new words". Electrophoresis 19 (11): 1853–61. doi:10.1002/elps.1150191103. PMID 9740045.
- ^ Blackstock WP, Weir MP (1999). "Proteomics: quantitative and physical mapping of cellular proteins". Trends Biotechnol. 17 (3): 121–7. PMID 10189717.
Bibliography
- Belhajjame, K. et al. Proteome Data Integration: Characteristics and Challenges. Proceedings of the UK e-Science All Hands Meeting, ISBN 1-904425-53-4, September 2005, Nottingham, UK.
- Twyman, R. M. 2004. Principles of proteomics. BIOS Scientific Publishers, New York. ISBN 1-85996-273-4.(covers almost all branches of proteomics)
- Westermeier, R. and T. Naven. 2002. Proteomics in practice: a laboratory manual of proteome analysis. Wiley-VCH, Weinheim. ISBN 3-527-30354-5.(focused on 2D-gels, good on detail)
- Liebler, D. C. 2002. Introduction to proteomics: tools for the new biology. Humana Press, Totowa, NJ. ISBN 0-585-41879-9 (electronic, on Netlibrary?), ISBN 0-89603-991-9 hardback, ISBN 0-89603-992-7 paperback.
- Wilkins MR, Williams KL, Appel RD, Hochstrasser DF. Proteome research: new frontiers in functional genomics. Berlin Heidelberg, Springer Verlag; 1997, ISBN 3-540-62753-7.
- Arora, Pankaj S., et al. (2005). "Comparative evaluation of two two-dimensional gel electrophoresis image analysis software applications using synovial fluids from patients with joint disease". Journal of Orthopaedic Science 10 (2): 160-166. [1]
- Rediscovering Biology Online Textbook. Unit 2 Proteins and Proteomics. 1997-2006.
- Weaver. R.F. Molecular Biology. Third Edition. The McGraw-Hill Companies Inc. 2005. pgs 840-849.
- Campbell and Reece. Biology. Sixth Edition. Pearson Education Inc. 2002. pg 392-393.
- Hye A, Lynham S, Thambisetty M, et al. " Proteome-based plasma biomarkers for Alzheimer's disease." Brain 129: 3042-3050, (2006).
- Perroud B, Lee J, Valkova N, et al. "Pathway Analysis of Kidney Cancer Using Proteomics and *Metabolic Profiling." Biomed Central: 65-82, (24 November 2006).
- Macaulay IC, Carr P, Gusnanto A, et al. "Platelet Genomics and Proteomics in Human Health and Disease." The Journal of Clinical Investigation 115: 3370-3377, (December 2005).
- Rogers MA, Clarke P, Noble J, et al. "Proteomic Profiling of Urinary Proteins in Renal Cancer by Surface Enhanced Laser Desorption Ionization, and Neural-Network Analysis: Identification of Key Issues Affecting Clinical Potential Utility." Cancer Research 63: 6971-6983, (15 October 2003).
- Vasan RS. “Biomarkers of cardiovascular disease: molecular basis and practical considerations” Circulation. 2006;113:2335-2362.
- “Myocardial Infaction”. http://medlib.med.utah.edu/WebPath/TUTORIAL/MYOCARD/MYOCARD.html (Retrieved 29 Nov 2006)
- World Community Grid. http://www.worldcommunitygrid.org (Retrieved 29 Nov 2006)
- Introduction to Antibodies - Enzyme-Linked Immunosorbent Assay (ELISA). http://www.chemicon.com/resource/ANT101/a2C.asp. (Retrieved 29 Nov 2006)
- Decramer S et al "Predicting the clinical outcome of congenital unilateral ureteropelvic junction obstruction in newborn by urinary proteome analysis" Nature Medicine 2006; 12:398-400 Article
- Mayer, U. (2008) Protein Information Crawler (PIC): Extensive spidering of multiple protein information resources for large protein sets. Proteomics, 8: 42-44 PubMed