Several methods have been proposed in the literature for the distribution of data on distributed memory machines, either oriented to dense or sparse structures. Many of the real applications, however, deal with both kind of data jointly. This paper presents techniques for integrating dense and sparse array accesses in a way that optimizes locality and further allows an efficient loop partitioning within a dataparallel compiler. Our approach is evaluated through an experimental survey with several compilers and parallel platforms. The results prove the benefits of the BRS sparse distribution when combined with CYCLIC in mixed algorithms and the poor efficiency achieved by well-known distribution schemes when sparse elements arise in the source code.
Gerardo Bandera, Manuel Ujaldon, María A. T