Cuong Than, Parsimonious inference of species phylogenies from multi-locus data

A species tree models the evolutionary history of a set of organisms. Such trees allow us to gain insights into the mechanisms of evolution and to hypothesize about past biological events. Inferring species trees is, therefore, of great importance. The traditional species tree inference approach entails reconstructing a gene tree and taking it to be the species tree. However, gene trees can be different from each other and from their containing species trees for various biological reasons, as has become evident in phylogenetic analyses of genomic data. The new challenge is, then, to infer species phylogenies from a set of genomic regions despite the potentially conflicting evolutionary histories that those regions may display. One of the biological processes that cause species tree and gene tree incongruence is lineage sorting. In this dissertation, we present novel algorithmic techniques for inferring species trees from a set of gene trees when the incongruence is assumed to be due to lineage sorting. Our algorithms perform very well in terms of accuracy and speed. Experimental results on both real and simulated data show that our methods achieve accuracy at least as good as competing methods, yet are much faster than them.

We then show how to incorporate reticulate evolution and lineage sorting in a unified approach in order to detect hybridization despite lineage sorting. We used this framework to reanalyze a yeast data set, and detected hybridization in the data set, despite an extensive amount of lineage sorting.

The algorithmic techniques we have developed will allow biologists to efficiently analyze the evolutionary histories of multiple loci, and infer phylogenetic relationships of species from such data, in the presence of lineage sorting and reticulate evolutionary events.