Title |
History of Bioinformatics |
Authors |
Jong Bhak |
Contact |
j@bio.cc, BiO Centre, Cambridge, UK, +44 1223 524889 |
BiopaperNumber |
BiO20030320.00001 |
Refer this as |
J. Park, (2003), The history of Bioinformatics, BiO On-line publication. UniqueBioPaperNumber (UBIPAN): BiO20030320.00001 http://bio.cc/Bioinformatics/history_of_bioinformatics.html |
Publication Date |
2003. March. 20th. |
Paper Type |
non-research paper. |
Intellectual Property |
(c) copyright. Please refer to the above URL for reference. |
Related Papers |
Abstract:
Modern bioinformatics is broadly comprised of two main disciplines. One is biological science and the other is computer science. Understanding the history of any academic discipline lets the new learners have a more wider and correct insight toward their research. Here, a succinct chronological data of historical events for both biology and computer science are presented.
Introduction:
The history of biology in general, B.C. and before the discovery of genetic inheritance by G. Mendel in 1865, is extremely sketch and inaccurate. Also, there is a great bias toward the western civilization. Therefore, this part of the history should be viewed as an extremely rough guide to show how much pre-biology people knew about life. The advancement of computing in 1960-70s resulted in the basic methodology of bioinformatics. However, it is the 1990s when the INTERNET arrived when the full fledged bioinformatics field was born.
Results:
1843: Richard Owen elaborated the distinction of homology and analogy.
1850-1855: Jean-Baptiste Boussingault, who had proved that the carbon in plants came from atmospheric CO2, proposes that plant nitrogen comes from the soil. demonstrates that higher plants cannot utilize atmospheric nitrogen, but only nitrates from the soil. He also demonstrates the necessity of nitrogen for plants and animals. His experimental results were not conclusive, however, and conflicting data were soon published by another Parisian chemist, Ville, and popularized by Liebig. The question he resolved was whether the nitrogen that plants need to grow came from the soil or from the air. Joseph Priestley had argued, in the 18th century, in favor of the air, and his opinion was seconded in the early 19th century, by Liebig, then the world's most famous chemist.
1855: Alfred Russell Wallace publishes On the Law Which Has Regulated the Introduction of New Species
1858: Charles Darwin and Alfred Wallace publish papers on theory of evolution.
1859: Charles Darwin, Cambridge, UK, publishes The Origin of Species, vastly strengthening the adaptationist hypothesis.
1864: Ernst Haeckel (Häckel) outlines the essential elements of modern zoological classification
1865: Gregory Mendel (1823-1884), Austria, established the genetic inheritance. The theoretical study of genetics. Experiments in Plant Hybridisation. Cambridge, MA: Harvard University Press. His work, in German, was first published in 1865 in the Proceedings of the Brünn Society for Natural History, Brünn, Austria (Hewlett, 1998). It was ignored for a generation.
1868: Friedrich Miescher - discovery of nuclein found in cell nucleus, acidic, rich in PO4, lacks S (characteristic of protein). Now know this as nucleic acid
1902: The chromosome theory of heredity is proposed by Sutton and Boveri, working independently.
1905: The word "genetics" is coined by William Bateson.
1913: First ever linkage map created by Columbia undergraduate Alfred Sturtevant (working with T.H. Morgan).
1918-1926: Muller, Hermann J. (1962). Studies in Genetics. [His seminal paper on X-rays, from 1927, may be present in this collection.] The gene constitutes the basis of life and evolution by virtue of its property of reproducing its own internal changes
1930: Tiselius, Uppsala University, Sweden, A new technique, electrophoresis, is introduced by Tiselius for separating proteins in solution. "The moving-boundary method of studying the electrophoresis of proteins" (published in Nova Acta Regiae Societatis Scientiarum Upsaliensis, Ser. IV, Vol. 7, No. 4)
1930s: Chemical nature of nuclei acid investigated. It was thought to be a tetranucleotide composed of one unit each of adenylic, guanylic, thymidylic and cytidylic acids
1936: Alan Turing, Cambridge University, The Turing machine, computability, universal machine
1941: Beadle and Tatum. Genetic Control of Biochemical Reactions in Neurospora: First sound scientific evidence for one-gene-one-enzyme hypothesis
1944: Oswald Avery identifies nucleic acids as the active principle in bacterial transformation. Avery, O. T., C. M. MacLeod, and M. McCarty (1944). Studies on the Chemical Nature of Substance Inducing Transformation of Pneumococcal Typoes. Induction of Transformation by a Desoxyribonucleic Acid Fraction Isolated from Pneumococcus Type III. Journal of Experimental Medicine 79: 137-158. Also in Peters (1959). Oswald Avery (1877-1955) was a bacteriologist whose research on pneumococcus bacteria made him one of the founders of immunochemistry and laid the foundation for later discoveries that launched the science of molecular genetics.
1945: John von Neumann, Princeton University, USA, First Draft of a Report on the EDVAC, Contract No. W-670-ORD-492, Moore School of Electrical Engineering, Univ. of Penn., Philadelphia. Reprinted (in part) in Randell, Brian. 1982. Origins of Digital Computers: Selected Papers, Springer-Verlag, Berlin Heidelberg, pp. 383-392.
1946: Genetic material can be transferred laterally between bacterial cells, as shown by Lederberg and Tatum.
1948: Information Theory Claude Shannon
1950: Erwin Chargaff shows that the four nucleotides are not present in nucleic acids in stable proportions, and that the nucleotide composition differs according to its biological source. Chargaff, Erwin, ed. (1955-60). The Nucleic Acids: Chemistry and Biology. New York, Academic Press.
1951: Pauling and Corey propose the structure for the alpha-helix and beta-sheet (Proc. Natl. Acad. Sci. USA, 27: 205-211, 1951; Proc. Natl. Acad. Sci. USA, 37: 729-740, 1951).
1952: Alfred Day Hershey and Martha Chase proved, on the basis of their bacteriophage research, that DNA alone carries genetic information.
1953: James Dewey Watson and Francis Harry Compton Crick , Cambridge, UK, propose the double helix model for DNA based on x-ray data obtained by Franklin and Wilkins (Nature, 171: 737-738, 1953).
1953: Frederick Sanger, E. O. P. Thompson and Hans Tuppy completed the determination of the amino acid sequence of the A and B chains of insulin. Cambridge, UK.
1954: Max Perutz's group in Cambridge UK develops heavy atom methods to solve the phase problem in protein crystallography.
1956: Christian Boehmer Anfinsen and White concluded that the three-dimensional conformation of proteins is specified by their amino acid sequence.
1957: Seymour Benzer introduced the concept of the cistron: the smallest unit of function of the gene.
1958: The first integrated circuit is constructed by Jack Kilby at Texas Instruments.
1958: The Advanced Research Projects Agency (ARPA) is formed in the US.
1958: Francis Harry Compton Crick, Cambridge, UK, enunciated the central dogma of molecular genetics: information flows from DNA to RNA to protein.
1960: Fran?is Jacob and Jacques Lucien Monod proposed the operon hypothesis for the regulation of enzyme synthesis.
1961: Sidney Brenner, François Jacob, Matthew Meselson, identify messenger RNA,
1961-1965: The laboratories of Robert William Holley, Marshall Warren Nirenberg, Har Gobind Khorana and Severo Ochoa identified the genetic code words for the amino acids.
1965: Margaret Dayhoff's The first Atlas of Protein Sequence and Structure, which contained sequence information on 65 proteins.
1967: W.M. Fitch and E. Margoliash calculated the phylogenetic relationships of twenty organisms, ranging from fungi to mammals, by comparing their cytochrome C amino acid sequences.
1968: Packet-switching network protocols are presented to ARPA.
1968: Kimura, M. Evolutionary rate at the molecular level. Nature 217 (1968) 624-626.
1969: The ARPANET is created by linking computers at Stanford, UCSB, The University of Utah and UCLA.
1970s: Fred Sanger, Cambridge UK, developed deoxy DNA sequencing method.
1970: Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443-53.
1970: Fitch, W. M. Distinguishing homologous from analogous proteins. Syst Zool (1970) 19:99-113.
1970: The first restriction enzyme was isolated.
1971: Lynn Margulis proposed an endosymbiont theory for the origins of eucaryotic organelles.
1971: Ray Tomlinson (BBN) invents the email program.
1971: Medline. NIH.
1972: The first recombinant DNA molecule is created by Paul Berg and his group.
1973: The Brookhaven Protein Data Bank is announced (Acta. Cryst. B, 1973, 29: 1746).
1973: Robert Metcalfe receives his Ph.D. from Harvard University. His thesis describes Ethernet.
1974: Langley, C.H. and Fitch, W.M., An examination of the constancy of the rate of molecular evolution. J. Mol. Evol. 3 (1974) 161-177.
1974: Vint Cerf and Robert Kahn develop the concept of connecting networks of computers into an "internet" and develop the Transmission Control Protocol (TCP).
1974: Charles Goldfarb invents SGML (Standardized General Markup Language).
1974: Chothia, C Hydrophobic bonding and accessible surface area in proteins. Nature 1974 Mar 22;248(446):338-9
1974: Chou PY, Fasman GD. Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry. 1974 Jan 15;13(2):211-22.
1975: Microsoft Corporation is founded by Bill Gates and Paul Allen.
1975: Cesar Milstein group's Monoclonal antibodies are produced
1975: King and Wilson, suggests the difference between Chimpanzee and humans is small. King, M.C. and A.C. Wilson (1975). Evolution at two levels in Humans and Chimpanzees. Science 188: 107-116.
For an update on the topic, see Gibbons 1998; for recent work on multiple transcriptional controls, see Tijan and Holmes 2000.
1975: Two-dimensional electrophoresis, where separation of proteins on SDS polyacrylamide gel is combined with separation according to isoelectric points, is announced by P. H. O'Farrell (J. Biol. Chem., 250: 4007-4021, 1975).
1975: E. M. Southern published the experimental details for the Southern Blot technique of specific sequences of DNA (J. Mol. Biol., 98: 503-517, 1975).
1976: The Unix-To-Unix Copy Protocol (UUCP) is developed at Bell Labs. Dr. Robert M. Metcalfe develops Ethernet, which allowed coaxial cable to move data extremely fast. This was a crucial component to the development of LANs. The packet satellite project went into practical use. SATNET, Atlantic packet Satellite network, was born. This network linked the United States with Europe.Surprisingly, it used INTELSAT satellites that were owned by a consortium of countries and not exclusively the United States government. UUCP (Unix-to-Unix CoPy) developed at AT&T Bell Labs and distributed with UNIX one year later. The Department of Defense began to experiment with the TCP/IP protocol and soon decided to require it for use on ARPANET.
1977: Staden programs. DNA sequence analysis software. Published in NAR. Roger Staden, MRC, LMB, Cambridge, UK
1977: The full description of the Brookhaven PDB (http://www.pdb.bnl.gov) is published (Bernstein, F.C.; Koetzle, T.F.; Williams, G.J.B.; Meyer, E.F.; Brice, M.D.; Rodgers, J.R.; Kennard, O.; Shimanouchi, T.; Tasumi, M.J.; J. Mol. Biol., 1977, 112:, 535).
1977: Procedures were developed for rapidly sequencing long sections of DNA.
1978: The first Usenet connection is established between Duke and the University of North Carolina at Chapel Hill by Tom Truscott, Jim Ellis and Steve Bellovin.
1979: Goodman, M., Cselusniak, J., Moore, G. W., Romero-Herrera, A. E., and Matsuda, G. Fitting the gene lineage into its species lineage: A parsimony strategy illustrated by cladograms constructed from globin sequences. Syst. Zool. (1979) 28:132-168.
1980: The first complete genome sequence for virus (pi-x 174) by Sanger group Cambridge, UK, is published. The gene consists of 5,386 base pairs which code nine proteins.
1980: Wüthrich et. al. publish paper detailing the use of multi-dimensional NMR for protein structure determination (Kumar, A.; Ernst, R.R.; Wüthrich, K.; Biochem. Biophys. Res. Comm., 1980, 95:, 1).
1981: The Smith-Waterman algorithm for sequence alignment is published. Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195-7.
1981: Sequence motif, Russell Doolittle.
1981: IBM introduces its Personal Computer to the market.
1981: Felsenstein, J. Evolutionary Trees from DNA-Sequences - a Maximum-Likelihood Approach. J. Mol. Evol. (1981) 17:368-376. (hardcopy available
1982: Genetics Computer Group (GCG) created as a part of the University of Wisconsin of Wisconsin Biotechnology Center. The company's primary product is The Wisconsin Suite of molecular biology tools.
1982: GenBank LANL/EMBL/NCBI
1983: The Compact Disk (CD) is launched.
1983: Name servers are developed at the University of Wisconsin.
1983: Kary B. Mullis invents the polymerase chain reaction (PCR), a method for rapidly and easily cloning DNA fragments.
1984: Jon Postel's Domain Name System (DNS) is placed on-line.
1984: The Macintosh is announced by Apple Computer.
1985: The FASTP algorithm by Bill Pearson is published.
1985: The PCR reaction is described by Kary Mullis and co-workers.
1985: Richard Stallman's Open Software Foundation.
1986: The SWISS-PROT database is created by the Department of Medical Biochemistry of the University of Geneva and the European Molecular Biology Laboratory (EMBL).
1987: The use of yeast artifical chromosomes (YAC) is described (David T. Burke, et. al., Science, 236: 806-812).
1987: McClintock, Barbara (1987). The Discovery and Characterization of Transposable Elements: The Collected Custon College Paper of Barbara McClintock. New York: Garland, 1987. In her 1983 Nobel lecture, McClintock said the genome is "a highly sensitive organ of the cell, that in times of stress could initiate its own restructuring and renovation." See the biography at the Cold Springs Harbor site (external). For a current discussion, see Pennisi 1998.
1987: The physical map of e. coli is published (Y. Kohara, et. al., Cell 51: 319-337).
1987: Perl (Practical Extraction Report Language) is released by Larry Wall.
1988: The National Center for Biotechnology Information (NCBI) is established at the National Cancer Institute.
1988: DNA Strider Christian Marck
1988: The Human Genome Initiative is started (Commission on Life Sciences, National Research Council. Mapping and Sequencing the Human Genome, National Academy Press: Washington, D.C.), 1988.
1988: The FASTA algorithm for sequence comparison is published by Pearson and Lupman.
1989: The Genetics Computer Group (GCG) becomes a private company.
1989: Oxford Molecular Group, Ltd. (OMG) founded in Oxford, UK by Anthony Marchington, David Ricketts, James Hiddleston, Anthony Rees, and W. Graham Richards. Primary products: Anaconda, Asp, Cameleon and others (molecular modeling, drug design, protein design).
1990: The BLAST program (Altschul, et. al.) is implemented. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403-10.
1990: The HTTP 1.0 specification is published. Tim Berners-Lee publishes the first HTML document. Merit, IBM and MCI formed a not for profit corporation called ANS, Advanced Network & Services, which was to conduct research into high speed networking. It soon came up with the concept of the T3, a 45 Mbps line. NSF quickly adopted the new network and by the end of 1991 all of its sites were connected by this new backbone. While the T3 lines were being constructed, the Department of Defense disbanded the ARPANET and it was replaced by the NSFNET backbone. The original 50Kbs lines of ARPANET were taken out of service. Tim Berners-Lee and CERN in Geneva implements a hypertext system to provide efficient information access to the members of the international high-energy physics community.
1991: Linus Torvalds announces a Unix-Like operating system which later becomes Linux.
1991: The creation and use of expressed sequence tags (ESTs) is described (J. Craig Venter, et. al., Science, 252: 1651-1656).
1992: FSSP the global protein structural family database published by Liisa Holm et al., Protein Sci 1992 Dec;1(12):1691-1698 A database of protein structure families with common folding motifs. Holm L, Ouzounis C, Sander C, Tuparev G, Vriend G
1992: Cyrus Chothia, Cambridge UK, suggests approximate number of protein families to be 1000. Nature, 1992, June, 357, 543-544 Proteins. One thousand families for the molecular biologist.
1992: Jones DT, Taylor WR, Thornton JM. A new approach to protein fold recognition. Nature. 1992 Jul 2;358(6381):86-9, PubMed
1993: Dali was published in JMB by Liisa Holm and Christ Sander. J Mol Biol 1993 Sep 5;233(1):123-138 Protein structure comparison by alignment of distance matrices. Holm L, Sander C.
1993: Rost B, Sander C. Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol. 1993 Jul 20;232(2):584-99, PubMed.
1993: InterNIC created by NSF to provide specific Internet services: directory and database services (by AT&T), registration services (by Network Solutions Inc.), and information services (by General Atomics/CERFnet). Marc Andreessen and NCSA and the University of Illinois develops a graphical user interface to the WWW, called "Mosaic for X".
1993: Hidden Markov Model based algorithm popularized.
1993: Affymetrix begins independent operations in Santa Clara, California
1993: Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F., & Wootton, J. C. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 1993, 262(5131), 208-14. PubMed
1994: The first CASP (protein structure prediction meeting) held at Asilomar, California. Hidden Markov Model, Interative search method, Threading method were successful in predicting protein structures.
1994: DNA computer Leonard Adelman
1995: The first free-living organism Haemophilus influenzea genome (1.8 Mb) is sequenced.
1995: SCOP data base published. (structural classification of proteins).
1995: The smallest free-living organism Mycoplasma genitalium genome is sequenced.
1995: The first open-community BioPerl project (with other sister projects BioJava, BioLinux, etc) in bioinformatics initiated by Jong Park and Steve Brenner, Cambridge, MRC Centre, UK (history_of_bioperl.html)
1996: The genome for Saccharomyces cerevisiae (baker's yeast, 12.1 Mb) is sequenced.
1996-1997: The first cloning of a mammal (Dolly the sheep) is performed by Ian Wilmut and colleagues, from the Roslin institute in Scotland.
1996: Affymetrix produces the first commercial DNA chips.
1997: The genome for E. coli (4.7 Mbp) is published.
1997: Intermediate Sequence Search method by J. Park, et al., proving the validity of homology transitivity in sequence searches by using structural homology benchmark set that was based on SCOP.
1997: Dahiyat BI, Mayo SL. De novo protein design: fully automated sequence selection. Science. 1997 Oct 3;278(5335):82-7. PubMed
1997: PSI-BLAST algorithm was published. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Domains Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389-402. PubMed
1998: The genomes for Caenorhabditis elegans and baker's yeast are published.
1998: Complete genomes show extensive gene/protein sequence/structure duplication. Teichmann etc. al. PNAS.
1998: Proving that multiple sequence based sequence search algorithms (use much more homology information than pairwise methods. J. Park, et al.
1998: Inpharmatica, a new Genomics and Bioinformatics company, is established by University College London, the Wolfson Institute for Biomedical Research, five leading scientists from major British academic centers and Unibio Limited.
1999: Protein Structural Interactome Map: PSIMAP including the first full genome interaction network using PDB and yeast two hybrid system was created by Liisa Holm group members, EBI, Cambridge, UK ( J Park, Liisa Holm, Michael Lappe) and S Teichmann. It is the first phylogenetic interaction network. The first map using protein Domains. The first global interaction network.
1999: Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. Detecting protein function and protein-protein interactions from genome sequences. Science. 1999 Jul 30;285(5428):751-3. PubMed
1999: Bush, R. M., Bender, C. A., Subbarao, K., Cox, N. J., and Fitch, W. M. Predicting the evolution of human influenza A. Science (1999) 286:1921-1925.
1999: Barabasi AL, Albert R. Emergence of scaling in random networks. Science 1999 Oct 15;286(5439):509-12, PubMed
2000: Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL. The large-scale organization of metabolic networks. Nature 2000 Oct 5;407(6804):651-4, PubMed
2000: The genome for Pseudomonas aeruginosa (6.3 Mbp) is published.
2000: The A. thaliana genome (100 Mb) is secquenced.
2000: The D. melanogaster genome (180Mb) is secquenced.
2001: The human genome (3 Giga base pairs) is published.
2002:
Online References:
The hisotyr of internet: http://www.davesite.com/webstation/net-history.shtml
Allen B. Richon, E-mail: arichon@netsci.org http://www.netsci.org/Science/Bioinform/feature06.html
Internet hisotyr: http://members.magnet.at/dmayr/history.htm
Biological hisotory to 1953: http://www.mun.ca/biology/scarr/2250_History.htm
Long history of biology: http://www.crevola.com/laurent/sitelolo/histoire/historybc.html
http://cumicro2.cpmc.columbia.edu/icb/: http://cumicro2.cpmc.columbia.edu/icb/Lecture%201.pdf, jovanovic@cancercenter.columbia.edu <jovanovic@cancercenter.columbia.edu>
Classical papers in bioinformatics: http://www.sbc.su.se/~per/classics-bioinfo/
About Darwinism: http://www.aboutdarwin.com/literature/Pre_Dar.html
Theoretical Biology: http://www.zbi.ee/~uexkull/theor.htm
John Blamire: http://www.brooklyn.cuny.edu/bc/ahp/MBG/MBG3/MBG.C3.Question.html
===============================================================
Off-line References
J. Cairns, G. Stent, & J. Watson (1966). Phage and the Origins of Molecular Biology. Freeman.
[Biographical essays on the early days by the founders of molecular genetics]
F. H. C. Crick (1988). What Mad Pursuit? Basic Books.
[Crick's version of the 'double helix' history, and lots more]
L. Gonick & M. Wheelis (1991). The Cartoon Guide to Genetics, 2nd ed. Harper Collins.
[Great illustrations: a good primer of basic Mendelian and molecular genetics]
H. F. Judson (1979). The Eighth Day of Creation. Simon & Schuster.
[A general history of molecular biology]
A. Sayre (1975). Rosalind Franklin and DNA. Norton.
[A re-appraisal of the role of Franklin, with commentary on the role of women in science]
G. Stent (1971). Molecular Genetics: an introductory narrative. Freeman.
[A classic, now factually dated textbook, still highly readable]
J. D. Watson (1968). The Double Helix. Atheneum.
[An entertaining, irreverent, sexist, account of the discovery of the structure of DNA.
See the accounts of Crick and Sayre for an antidote]
History of Genetics: From Prehistoric Times to the Rediscovery of Mendel's Laws by Hans Stubbe (MIT press, out of print)
A History of Genetics by Alfred Sturtevant
The Eighth Day of Creation by Horace Judson (focus on molecular biology)
The Century of the Gene by Evelyn Fox Keller
Cracking the Genome : Inside the Race to Unlock Human DNA by Kevin Davies
========================================================================
j@bio.cc
BiO