Under current worst-case design practices, manufacturers specify conservative values for processor frequencies in order to guarantee correctness. To recover some of the lost perfo...
There has recently been much interest in stream processing, both in industry (e.g., Cell, NVIDIA G80, ATI R580) and academia (e.g., Stanford Merrimac, MIT RAW), with stream progra...
Jayanth Gummaraju, Mattan Erez, Joel Coburn, Mende...
Transactional Memory (TM) simplifies parallel programming by supporting atomic and isolated execution of user-identified tasks. To date, TM programming has required the use of l...
Woongki Baek, Chi Cao Minh, Martin Trautmann, Chri...
The allocation of lock objects to critical sections in concurrent programs affects both performance and correctness. Recent work explores automatic lock allocation, aiming primari...
Richard L. Halpert, Christopher J. F. Pickett, Cla...
The process of verifying a new microprocessor is a major problem for the computer industry. Currently, architects design processors to be fast, power-efficient, and reliable. Howe...
Performance auditing is an online optimization strategy that empirically measures the effectiveness of an optimization on a particular code region. It has the potential to greatly...
Jeremy Lau, Matthew Arnold, Michael Hind, Brad Cal...
Loop nest optimization is a combinatorial problem. Due to the growing complexity of modern architectures, it involves two increasingly difficult tasks: (1) analyzing the profita...
Continued scaling of CMOS technology to smaller transistor sizes makes modern processors more susceptible to both transient and permanent hardware faults. Circuitlevel techniques ...
Many sorting algorithms have been studied in the past, but there are only a few algorithms that can effectively exploit both SIMD instructions and threadlevel parallelism. In this...