This paper contributes and evaluates a model and a methodology for implementing parallel wavefront algorithms on the Cell Broadband Engine. Wavefront algorithms are vital in several application areas such as computational biology, particle physics, and systems of linear equations. The model uses blocked data decomposition with pipelined execution of blocks across the synergistic processing elements (SPEs) of the Cell. To evaluate the model, we implement the SmithWaterman pairwise sequence alignment algorithm as a wavefront algorithm and present key optimization techniques that significantly enhance the vector processing capabilities of the SPE. Our results show perfect linear speedup for up to 16 SPEs on the QS20 dual-Cell blades, and our model is highly scalable for more cores, if available. The accuracy of our model is within 3% of the measured values on average. We then test our model in a throughput-oriented experimental setting, where we couple our model with scheduling technique...
Ashwin M. Aji, Wu-chun Feng, Filip Blagojevic, Dim