The Machine Learning Approach
Second Edition

An unprecedented wealth of data is being generated by genome sequencing projects and other experimental efforts to determine the structure and function of biological molecules. The demands and opportunities for interpreting these data are expanding rapidly. Bioinformatics is the development and application of computer methods for management, analysis, interpretation, and prediction, as well as for the design of experiments. Machine learning approaches (e.g., neural networks, hidden Markov models, and belief networks) are ideally suited for areas where there is a lot of data but little theory, which is the situation in molecular biology. The goal in machine learning is to extract useful information from a body of data by building good probabilistic models—and to automate the process as much as possible.

In this book Pierre Baldi and Søren Brunak present the key machine learning approaches and apply them to the computational problems encountered in the analysis of biological data. The book is aimed both at biologists and biochemists who need to understand new data-driven algorithms and at those with a primary background in physics, mathematics, statistics, or computer science who need to know more about applications in molecular biology.

This new second edition contains expanded coverage of probabilistic graphical models and of the applications of neural networks, as well as a new chapter on microarrays and gene expression. The entire text has been extensively revised.

Table of Contents

  1. Series Foreword
  2. Preface
  3. 1. Introduction
  4. 2. Machine-Learning Foundations: The Probabilistic Framework
  5. 3. Probabilistic Modeling and Inference: Examples
  6. 4. Machine Learning Algorithms
  7. 5. Neural Networks: The Theory
  8. 6. Neural Networks: Applications
  9. 7. Hidden Markov Models: The Theory
  10. 8. Hidden Markov Models: Applications
  11. 9. Probabilistic Graphical Models in Bioinformatics
  12. 10. Probabilistic Models of Evolution: Phylogenetic Trees
  13. 11. Stochastic Grammars and Linguistics
  14. 12. Microarrays and Gene Expression
  15. 13. Internet Resources and Public Databases
  16. Appendix A: Statistics
  17. Appendix B: Information Theory, Entropy, and Relative Entropy
  18. Appendix C: Probabilistic Graphical Models
  19. Appendix D: HMM Technicalities, Scaling, Periodic Architectures, State Functions, and Dirichlet Mixtures
  20. Appendix E: Gaussian Processes, Kernel Methods, and Support Vector Machines
  21. Appendix F: Symbols and Abbreviations
  22. References
  23. Index