This paper presents a multicore-cache model that reflects the reality that multicore processors have both per-processor private (L1) caches and a large shared (L2) cache on chip. We consider a broad class of parallel divide-andconquer algorithms and present a new on-line scheduler, controlled-pdf, that is competitive with the standard sequential scheduler in the following sense. Given any dynamically unfolding computation DAG from this class of algorithms, the cache complexity on the multicore-cache model under our new scheduler is within a constant factor of the sequential cache complexity for both L1 and L2, while the time complexity is within a constant factor of the sequential time complexity divided by the number of processors p. These are the first such asymptoticallyoptimal results for any multicore model. Finally, we show that a separator-based algorithm for sparse-matrix-densevector-multiply achieves provably good cache performance in the multicore-cache model, as well as in ...
Guy E. Blelloch, Rezaul Alam Chowdhury, Phillip B.