Left unchecked, the fundamental drive to increase peak performance using tens of thousands of power hungry components will lead to intolerable operating costs and failure rates. High-performance, power-aware distributed computing reduces power and energy consumption of distributed applications and systems without sacrificing performance. Generally, we use DVS (Dynamic Voltage Scaling) technology now available in high-performance microprocessors to reduce power consumption during parallel application runs when peak CPU performance is not necessary due to load imbalance, communication delays, etc. We propose distributed performance-directed DVS scheduling strategies for use in scalable power-aware HPC clusters. By varying scheduling granularity we can obtain significant energy savings without increasing execution time (36% for FT from NAS PB). We created a software framework to implement and evaluate our various techniques and show performance-directed scheduling consistently saves more...
Rong Ge, Xizhou Feng, Kirk W. Cameron