—While many-core accelerator architectures, such as today’s Graphics Processing Units (GPUs), offer orders of magnitude more raw computing power than contemporary CPUs, their massive parallelism often produces complex dynamic behaviors even with the simplest applications. Using a fixed set of hardware or simulator performance counters to quantify behavior over a large interval of time such as an entire application execution run or program phase may not capture this behavior. Software and/or hardware designers may consequently miss out on opportunities to optimize for better performance. Similarly, significant effort may be expended to find metrics that explain anomalous behavior in architecture design studies. Moreover, the increasing complexity of applications developed for today’s GPU has created additional difficulties for software developers when attempting to identify bottlenecks of an application for optimization. This paper presents a novel GPU performance visualizatio...
Aaron Ariel, Wilson W. L. Fung, Andrew E. Turner,