The Jaccard/Tanimoto coefficient is an important workload, used in a large variety of problems including drug design fingerprinting, clustering analysis, similarity web searching and image segmentation. This paper evaluates the Jaccard coefficient on the the Cell/B.E.TM processor and the Intel R Xeon R dual-core platform. In our work, we have developed a novel parallel algorithm specially suited for the Cell/B.E. architecture for all-to-all Jaccard comparisons, that minimizes DMA transfers and reuses data in the local store. We show that our implementation on Cell/B.E. outperforms the implementations on comparable Intel platforms by 6-20X with full accuracy, and from 10-50X in reduced accuracy mode, depending on the size of the data. In addition to performance, we also discuss in detail our efforts to optimize our workload on both the Cell/B.E. and the Intel architectures and explain how avenues for optimization on each architecture are very different and vary from one architecture to ...
Vipin Sachdeva, Douglas M. Freimuth, Chris Mueller