Search-based graph queries, such as finding short paths and isomorphic subgraphs, are dominated by memory latency. If input graphs can be partitioned appropriately, large cluster...
Jonathan W. Berry, Bruce Hendrickson, Simon Kahan,...
There has been intensive research on data prefetching focusing on performance improvement, however, the energy aspect of prefetching is relatively unknown. Our experiments show th...
Yao Guo, Saurabh Chheda, Israel Koren, C. Mani Kri...
Hardware support for dynamic analysis can minimize the performance overhead of useful applications such as security checks, debugging, and profiling. To eliminate implementation ...
Multicore is now the dominant processor trend, and the number of cores is rapidly increasing. The paradigm shift to multicore forces the redesign of the software stack, which incl...
Communication in cache-coherent distributed shared memory (DSM) often requires invalidating (or writing back) cached copies of a memory block, incurring high overheads. This paper...