Algorithms in computational Biology 

Algorithms and probabilistic models that arise in various computational biology applications, including:

  • Pattern matching with strings: suffix trees, suffix arrays, Burrows Wheeler Transform, inexact matches with dynamic programming, practical DNA sequence alignment
  • Alignment of biological networks
  • Phylogenetics: inference of phylogenetic trees, perfect phylogeny, likelihood estimation
  • Hidden Markov models with application to chromatin state annotation.

Prerequisite: Computer Science 70 and 170. Experience programming in a language such as C, C++, Java, or Python. There are no biology prerequisites for this course.

Last year’s syllabus

Lecture # Subject (general) Subject
1 Introduction
2 Exact string matching Naïve algorithm, Z-algorithm
3 Exact string matching Suffix trees
4 Exact string matching suffix arrays
5 Exact string matching Suffix array – linear time construction (Kärkkäinen and Sanders algorithm)
6 Exact string matching Introduction to Burrows Wheeler Transform
7 Exact string matching Pattern matching with BWT;  the FM index
8 Inexact matching Introduction to inexact matching and the Bowtie algorithm for DNA sequence alignment
9 Inexact matching Dynamic programming algorithms for variations of global sequence alignment (e.g., Needleman-­Wunsch)
10 Inexact matching Local alignment: Smith Waterman algorithm; Linear-space alignment (Hirschberg algorithm)
Mid term 1
11 Graph querying Graph querying: introduction and notations, Path queries
12 Graph querying Tree queries
13 Phylogenetics Introduction; tree-metrics and ultra-metrics
14 Phylogenetics Phylogeny reconstruction with a distance based approach: UPGMA and Neighbor-Joining
15 Phylogenetics Phylogeny reconstruction with a parsimony based approach: Fitch algorithm.
16 Phylogenetics Sankoff’s weight parsimony algorithm; The perfect phylogeny problem
17 Phylogenetics A linear-time perfect phylogeny algorithm   (split equivalence theorem);
18 Phylogenetics The general phylogeny reconstruction problem (Steiner trees in hypercubes)
19 Phylogenetics Markov processesd and estimating the likelihood of phylogenetic inference (Felsenstein pruning algorithm and the Jukes-Cantor model)
Mid term 2
20 HMM Introduction to HMM, forward & backward algorithms;
21 HMM Decoding (Viterbi, posterior)
22 HMM Parameter estimation using EM algorithm (Baum-Welch);
23 HMM Proof of Baum-Welch convergence; Sequence alignment with pair HMM
24 HMM Presentations from TA’s on their current research in compbio


Recommended text books:

  • Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, by Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison, Cambridge University Press, 1999
  • Algorithms on Strings, Trees and Sequences, by Dan Gusfield, Cambridge University Press, 1997
  • The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching, by Adjeroh, Donald, Bell, Timothy, Mukherjee, and Amar. Sringer 2008.