We introduce new distance measures for the construction and analysis of phylogenies, focusing on thioredoxin-fold proteins. Our distance measures for tree construction are based on several criteria, including pairwise alignment of only the thioredoxin fold region of each sequence, Hausdorff distance between sequences represented by sets of real vectors derived from per-residue features of the sequences, and properties of each sequence such as protein function and organism type. We also analyze and compare our trees in several ways. To corroborate the trees, we first compute the distance between the evolutionary trees, and then evaluate the trees based on conditional entropy. We also analyze the trees by finding common subtrees within and between our trees. Finally, biological analysis shows that trees based on our measures yield new information on proteins within the thioredoxin superfamily.
Chang Wang, Stephen D. Scott, Qingping Tao, Dmitri