Cuong Than, Accurate and efficient species tree reconstruction from genome-scale multi-locus data through a novel ILP formulation

Species trees model the evolutionary histories of groups of species and present a framework within which the diversity of species can be understood. Contained within the branches of a species tree are gene trees--the evolutionary histories of individual loci in the genomes. In population genetics, the evolution of genes is modeled using the coalescent process. The coalescent process is usually ignored in phylogenetic analyses, but it has a significant effect on these analyses, particularly when closely related species are considered. Because ancestral species might have contained multiple genetic lineages ancestral to present-day organisms, loci spread across a genome can have different histories. As genome-scale sequence data from thousands of loci become available, it is now critical to develop accurate and efficient tools for reconstructing the species tree from discordant gene trees. Current methods like Bayesian analysis not only require the knowledge of various parameters and data distributions, which are at best partially known, but also run slowly. We present a new mixed integer/linear programming approach to tackle the problem of species tree reconstruction. Experimental studies show that the new method is very fast and gives promising results.