The OrthoMCL database (http://orthomcl.cbil.upenn. edu) houses ortholog group predictions for 55 spe10 cies, including 16 bacterial and 4 archaeal genomes representing phylogenetically diverse lineages, and most currently available complete eukaryotic genomes: 24 unikonts (12 animals, 9 fungi, microsporidium, Dictyostelium, Entamoeba), 4 plants/algae and 15 7 apicomplexan parasites. OrthoMCL software was used to cluster proteins based on sequence similarity, using an all-against-all BLAST search of each species' proteome, followed by normalization of inter-species differences, and Markov clustering. A
Feng Chen, Aaron J. Mackey, Christian J. Stoeckert