The memory consistency model supported by a multiprocessor architecture determines the amount of buffering and pipelining that may be used to hide or reduce the latency of memory ...
Kourosh Gharachorloo, Anoop Gupta, John L. Henness...
Hiding communication latency is an important optimization for parallel programs. Programmers or compilers achieve this by using non-blocking communication primitives and overlappi...
Tracing algorithms visit reachable nodes in a graph and are central to activities such as garbage collection, marshalling etc. Traditional sequential algorithms use a worklist, re...
Aggressive hardware-based and software-based prefetch algorithms for hiding memory access latencies were proposed to bridge the gap of the expanding speed disparity between proces...
Many emerging processor microarchitectures seek to manage technological constraints (e.g., wire delay, power, and circuit complexity) by resorting to nonuniform designs that provi...