Much research has been devoted to making microprocessors energy-efficient. However, little attention has been paid to multiprocessor environments where, due to the co-operative nature of the computation, the most energy-efficient execution in each processor may not translate into the most energyefficient overall execution. We present the thrifty barrier, a hardware-software approach to saving energy in parallel applications that exhibit barrier synchronization imbalance. Threads that arrive early to a thrifty barrier pick among existing low-power processor sleep states based on predicted barrier stall time and other factors. We leverage the coherence protocol and propose small hardware extensions to achieve timely wake-up of these dormant threads, maximizing energy savings while minimizing the impact on performance.
Jian Li, José F. Martínez, Michael C