Introduction To Bioinformatics

E-Book Overview

Written by a pioneer of the use of bioinformatics in research, the second edition of Introduction to Bioinformatics introduces the student to the power of bioinformatics as a set of scientific tools. Retaining and enhancing the rich pedagogy and lucid presentation of the first edition, this new edition explains how to access the data archives of genomes and proteins, and the kind of questions these data and tools can answer. It also discusses how to make inferences from the data archives, how to make connections among them, and how to derive useful and interesting predictions. The book is accompanied by a fully integrated companion website.

E-Book Content

1 Introduction A scenario 3 Life in space and time 4 Dogmas: central and peripheral 5 Observables and data archives Curation, annotation, and quality control 8 10 The World Wide Web 11 Computers and computer science 14 Biological classification and nomenclature 19 Use of sequences to determine phylogenetic relationships 22 Searching for similar sequences in databases 31 Introduction to protein structure 39 Protein structure prediction and engineering 48 Clinical implications 50 Recommended reading Exercises, Problems, and Weblems 54 55 The hURLy-bURLy Electronic publication Programming Use of SINES and LINES to derive phylogenetic relationships The hierarchical nature of protein architecture Classification of protein structures Critical Assessment of Structure Prediction (CASP) Protein engineering The future 13 13 15 29 40 43 49 50 53 Biology has traditionally been an observational rather than a deductive science. Although recent developments have not altered this basic orientation, the nature of the data has radically changed. It is arguable that until recently all biological observations were fundamentally anecdotal – admittedly with varying degrees of precision, some very high indeed. However, in the last generation the data have become not only much more quantitative and precise, but, in the case of nucleotide and amino acid sequences, they have become discrete. It is possible to determine the genome sequence of an individual organism or clone not only 1: Introduction completely, but in principle exactly. Experimental error can never be avoided entirely, but for modern genomic sequencing it is extremely low. Not that this has converted biology into a deductive science. Life does obey principles of physics and chemistry, but for now life is too complex, and too dependent on historical contingency, for us to deduce its detailed properties from basic principles. A second obvious property of the data of bioinformatics is their very very large amount. Currently the nucleotide sequence databanks contain 6 × 109 bases (abbreviated 6 Mbp). If we use the approximate size of the human genome – 3 × 109 letters – as a unit, this amounts to two HUman Genome Equivalents (or 2 huges, an apt name). For a comprehensible standard of comparison, 1 huge is comparable to the number of characters appearing in six complete years of issues of The New York Times. The database of macromolecular structures contains 15 000 entries, the full three-dimensional coordinates of proteins, of average length ∼400 residues. Not only are the individual databanks large, but their sizes are increasing at a very high rate. Figure 1.1 shows the growth over the past decade of GenBank (archiving nucleic acid sequences) and the Protein Data Bank (PDB) (archiving macromolecular structures). It would be precarious to extrapolate. 15000 (a) Size (Mbp)
You might also like

Biochemistry
Authors: Garrett R.H. , Grisham C.M.    241    0


Encyclopedia Of Biological Chemistry
Authors: William J. Lennarz , M. Daniel Lane , Paul Modrich , Jack Dixon , Ernesto Carafoli , John Exton , Don Cleveland    222    0


The Art Of Cryogenics: Low-temperature Experimental Techniques
Authors: Guglielmo Ventura , Lara Risegari    171    0


The Gale Encyclopedia Of Genetic Disorders
Authors: Stacey Blachford    166    0


Fundamentals Of Air Pollution
Authors: Richard W. Boubel , Donald L. Fox , Bruce Turner , Arthur C. Stern    238    0


Rna-ligand Interactions, Part B
Authors: Melvin I. Simon , Daniel W. Celander , John N. Abelson    132    0



Gene Probes, Principles And Protocols
Authors: Marilena Aquino do Muro , Ralph Rapley    113    0


Medicinal Natural Products: A Biosynthetic Approach
Authors: Paul M. Dewick    153    0


Molecular Cytogenetics. Protocols And Applications
Authors: Yao-Shan Fan    156    0