Abstract— We developed an automated environment to measure the memory access behavior of applications on high performance clusters. Code optimization for processor caches is cruc...
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geogra...
Clusters of high-end workstations and PCs are currently used in many application domains to perform large-scale computations or as scalable servers for I/O bound tasks. Although c...
We developed a multithreaded parallel implementation of a sequence alignment algorithm that is able to align whole genomes with reliable output and reasonable cost. This paper pres...
Juan del Cuvillo, Xinmin Tian, Guang R. Gao, Milin...
Abstract—We present LeWI: a novel load balancing algorithm, that can balance applications with very different patterns of imbalance. Our algorithm can balance fine grain imbalan...