We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Contr...
While deterministic replay of parallel programs is a powerful technique, current proposals have shortcomings. Specifically, software-based replay systems have high overheads on mu...
Pablo Montesinos, Matthew Hicks, Samuel T. King, J...
Mispredicted branches and loads that miss in the cache cause the majority of retirement stalls experienced by sequential processors; we call these critical instructions. Despite t...
Transactional Memory (TM) simplifies parallel programming by allowing for parallel execution of atomic tasks. Thus far, TM systems have focused on implementing transactional stat...
Austen McDonald, JaeWoong Chung, Brian D. Carlstro...
With rapid technological advances in network infrastructure, programming languages, compatible component interfaces and so many more areas, today the computational Grid has evolve...