**Algorithms in computational Biology **

Algorithms and probabilistic models that arise in various computational biology applications, including:

- Pattern matching with strings: suffix trees, suffix arrays, Burrows Wheeler Transform, inexact matches with dynamic programming, practical DNA sequence alignment
- Alignment of biological networks
- Phylogenetics: inference of phylogenetic trees, perfect phylogeny, likelihood estimation
- Hidden Markov models with application to chromatin state annotation.

Prerequisite: Computer Science 70 and 170. Experience programming in a language such as C, C++, Java, or Python. There are no biology prerequisites for this course.

**Last year’s syllabus**

Lecture # |
Subject (general) |
Subject |

1 | Introduction | |

2 | Exact string matching | Naïve algorithm, Z-algorithm |

3 | Exact string matching | Suffix trees |

4 | Exact string matching | suffix arrays |

5 | Exact string matching | Suffix array – linear time construction (Kärkkäinen and Sanders algorithm) |

6 | Exact string matching | Introduction to Burrows Wheeler Transform |

7 | Exact string matching | Pattern matching with BWT; the FM index |

8 | Inexact matching | Introduction to inexact matching and the Bowtie algorithm for DNA sequence alignment |

9 | Inexact matching | Dynamic programming algorithms for variations of global sequence alignment (e.g., Needleman-Wunsch) |

10 | Inexact matching | Local alignment: Smith Waterman algorithm; Linear-space alignment (Hirschberg algorithm) |

Mid term 1 | ||

11 | Graph querying | Graph querying: introduction and notations, Path queries |

12 | Graph querying | Tree queries |

13 | Phylogenetics | Introduction; tree-metrics and ultra-metrics |

14 | Phylogenetics | Phylogeny reconstruction with a distance based approach: UPGMA and Neighbor-Joining |

15 | Phylogenetics | Phylogeny reconstruction with a parsimony based approach: Fitch algorithm. |

16 | Phylogenetics | Sankoff’s weight parsimony algorithm; The perfect phylogeny problem |

17 | Phylogenetics | A linear-time perfect phylogeny algorithm (split equivalence theorem); |

18 | Phylogenetics | The general phylogeny reconstruction problem (Steiner trees in hypercubes) |

19 | Phylogenetics | Markov processesd and estimating the likelihood of phylogenetic inference (Felsenstein pruning algorithm and the Jukes-Cantor model) |

Mid term 2 | ||

20 | HMM | Introduction to HMM, forward & backward algorithms; |

21 | HMM | Decoding (Viterbi, posterior) |

22 | HMM | Parameter estimation using EM algorithm (Baum-Welch); |

23 | HMM | Proof of Baum-Welch convergence; Sequence alignment with pair HMM |

24 | HMM | Presentations from TA’s on their current research in compbio |

**Recommended text books:**

**Biological Sequence Analysis****: Probabilistic Models of Proteins and Nucleic Acids**, by Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison, Cambridge University Press, 1999**Algorithms on Strings, Trees and Sequences**, by Dan Gusfield, Cambridge University Press, 1997**The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching,**by