Peng Du, Phylogenetic inference with Gibbs Sampling

Slides

Phylogenetic tree is a tree like representation of evolutionary relationships of species. Inferencing phylogenetic trees is a central task for various biological studies. Numerous methods have been developed for phylogenetic inference given observed DNA sequence data. One of the current most popular technique is Bayesian inference using Markov Chain Monte Carlo, especially the Metropolis Hasting algorithm. Metropolis Hastings algorithm works by proposing new tree and accept or reject the new tree according to if and how much the new tree is "better" than the old tree in terms of likelihood of this tree being the true tree that generated the sequence data. This method can sample millions of trees according to this likelihood, return them to the biologists and thus is attractive. However this method suffers from long running time because of an astronomical tree space to search and very difficult posterior likelihood landscape associated with those trees. In this talk, I will present a different method, Gibbs Sampling which is a special case of Metropolis Hastings algorithm but works differently. In our new method, instead of using random "cut" and "paste" technique to generate new tree, we can generate new trees by put species to the "good" locations according to the likelihood. As a result, this method gets to the high likelihood region much faster by avoiding time wasted on "bad" moves made by Metropolis Hastings algorithm. Result, discussion and future direction will be provided at the end of this talk.