While technology trends have ushered in the age of chip multiprocessors (CMP) and enabled designers to place an increasing number of cores on chip, a fundamental question is what ...
Divya Gulati, Changkyu Kim, Simha Sethumadhavan, S...
We design and implement Mars, a MapReduce framework, on graphics processors (GPUs). MapReduce is a distributed programming framework originally proposed by Google for the ease of ...
Bingsheng He, Wenbin Fang, Qiong Luo, Naga K. Govi...
Recently, chip multiprocessors (CMPs) have arisen as the de facto design for modern high-performance processors, with increasing core counts. An important property of CMPs is that...
One of the essential features in modern computer systems is context switching, which allows multiple threads of execution to time-share a limited number of processors. While very ...
Fang Liu, Fei Guo, Yan Solihin, Seongbeom Kim, Abd...
Both commercial and scientific workloads benefit from concurrency and exhibit data sharing across threads/processes. The resulting sharing patterns are often fine-grain, with t...
Hemayet Hossain, Sandhya Dwarkadas, Michael C. Hua...
Achieving good performance on a modern machine with a multi-level memory hierarchy, and in particular on a machine with software-managed memories, requires precise tuning of progr...
Manman Ren, Ji Young Park, Mike Houston, Alex Aike...
This work proposes and evaluates improvements to previously known algorithms for redundancy elimination. Enhanced Scalar Replacement combines two classic techniques, scalar replac...
Moore’s Law and the drive towards performance efficiency have led to the on-chip integration of general-purpose cores with special-purpose accelerators. Pangaea is a heterogeneo...
Henry Wong, Anne Bracy, Ethan Schuchman, Tor M. Aa...