Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency...
Cycles per Instruction (CPI) stacks break down processor execution time into a baseline CPI plus a number of miss event CPI components. CPI breakdowns can be very helpful in gaini...
Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program kernels. In this paper we explore the feasibility of using SC for more than small k...
Mahim Mishra, Timothy J. Callahan, Tiberiu Chelcea...
This paper explores hierarchical instruction scheduling for a tiled processor. Our results show that at the top level of the hierarchy, a simple profile-driven algorithm effective...
Martha Mercaldi, Steven Swanson, Andrew Petersen, ...
Debugging data races in parallel applications is a difficult task. Error-causing data races may appear to vanish due to changes in an application's optimization level, thread...
Paul Sack, Brian E. Bliss, Zhiqiang Ma, Paul Peter...