Abstract. Limited bandwidth to off-chip main memory is a performance bottleneck in chip multiprocessors for streaming computations, such as Cell/B.E., and this will become even mor...
Computing ”in-place and in-order”FFT poses a very difficult problem on hierarchical memory architectures where data movement can seriously degrade the performance. In this pape...
While commodity computing and graphics hardware has increased in capacity and dropped in cost, it is still quite difficult to make effective use of such systems for general-purpos...
E. Wes Bethel, Greg Humphreys, Brian E. Paul, J. D...
Emerging heterogeneous multiprocessors will have custom memory and bus architectures that must balance resource sharing and system partitioning to meet cost constraints. We propos...
Our goal is to develop a robust out-of-core sorting program for a distributed-memory cluster. The literature contains two dominant paradigms for out-of-core sorting algorithms: me...