In this paper, we present techniques and algorithms to improve the performance of various communication patterns on message-passing platforms where, for reasons of safety, user-le...
In this paper we use Dijkstra’s algorithm as a challenging, hard to parallelize paradigm to test the efficacy of several parallelization techniques in a multicore architecture....
Many emerging large-scale data science applications require searching large graphs distributed across multiple memories and processors. This paper presents a distributed breadth...
Andy Yoo, Edmond Chow, Keith W. Henderson, Will Mc...
In the study of PetaFlop project, Processor-In-Memory array was proposed to be a target architecture in achieving 1015 floating point operations per second computing performance. ...
Yi Tian, Edwin Hsing-Mean Sha, Chantana Chantrapor...
The increasing demand for computational cycles is being met by the use of multi-core processors. Having large number of cores per node necessitates multi-core aware designs to ext...
Krishna Chaitanya Kandalla, Hari Subramoni, Gopala...