It is estimated that ninety percent of the world’s species have yet to be discovered and described. The main reason for the slow pace of new species description is that the science of taxonomy, as traditionally practiced, can be very laborious. To formally describe a new species, taxonomists have to manually gather and analyze data from large numbers of specimens, often from broad geographic areas, and identify the smallest subset of external body characters that uniquely diagnoses the new species as distinct from all its known relatives. In this paper, we use an automated feature selection and classification approach to address the taxonomic impediment in new species discovery. The proposed computational framework can identify body shape characters that unite populations within species, as well as distinguishing among species. It also provides statistical “clues” for assisting taxonomists to identify new species or subspecies.
Yixin Chen, Henry L. Bart Jr., Shuqing Huang, Huim