Abstract. Conventional performance environments are based on pro ling and event instrumentation. It becomes problematic as parallel systems scale to hundreds of nodes and beyond. A...
Xian-He Sun, Mario Pantano, Thomas Fahringer, Zhao...
Traditional parallel compilers do not effectively parallelize irregular applications because they contain little looplevel parallelism due to ambiguous memory references. We explo...
Many biologically motivated problems are expressed as dynamic programming recurrences and are difficult to parallelize due to the intrinsic data dependencies in their algorithms. ...
Narayan Ganesan, Roger D. Chamberlain, Jeremy Buhl...
We present a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and...