Parallel file subsystems in today’s high-performance computers adopt many I/O optimization strategies that were designed for distributed systems. These strategies, for instance...
Wei-keng Liao, Kenin Coloma, Alok N. Choudhary, Le...
Two broad classes of memory models are available today: models with hardware cache coherence, used in conventional chip multiprocessors, and models that rely upon software to mana...
John H. Kelm, Daniel R. Johnson, William Tuohy, St...
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to addres...
Mark Horowitz, Margaret Martonosi, Todd C. Mowry, ...
Dependences among loads and stores whose addresses are unknown hinder the extraction of instruction level parallelism during the execution of a sequential program. Such ambiguous ...
Sridhar Gopal, T. N. Vijaykumar, James E. Smith, G...
Software code caches are becoming ubiquitous, in dynamic optimizers, runtime tool platforms, dynamic translators, fast simulators and emulators, and dynamic compilers. Caching fre...