In this papel; we address the severe performance gap caused by high processor clock rates and slow DRAM accesses. We show that even with an aggressive, next-generation memory syst...
We present compiler techniques for translating OpenMP shared-memory parallel applications into MPI messagepassing programs for execution on distributed memory systems. This transl...
—We present Huckleberry, a tool for automatically generating parallel implementations for multi-core platforms from sequential recursive divide-and-conquer programs. The recursiv...
Rebecca L. Collins, Bharadwaj Vellore, Luca P. Car...
Abstract. Developing efficient distributed applications while managing complexity can be challenging. Managing network latency is a key challenge for distributed applications. We ...
: The EM-4 is a supercomputer that offers very fast inter processor communication and support for multi threading. In this paper we demonstrate that the EM-4, Together with an auto...