In this paper we describe a new technique, called pipeline spectroscopy, and use it to measure the cost of each cache miss. The cost of a miss is displayed (graphed) as a histogra...
Thomas R. Puzak, Allan Hartstein, Philip G. Emma, ...
Previous research has used program transformation to introduce parallelism and to exploit data locality. Unfortunately,these twoobjectives have usuallybeen considered independentl...
We present a simple and novel framework for generating blocked codes for high-performance machines with a memory hierarchy. Unlike traditional compiler techniques like tiling, whi...
Binary Decision Diagrams BDDs are e cient at manipulating large sets in a compact manner. BDDs, however, are inefcient at utilizing the memory hierarchy of the computer. Recent ...
Abstract. As the speed gap between CPU and memory widens, memory hierarchy has become the performance bottleneck for most applications because of both the high latency and low band...
The performance of irregular applications on modern computer systems is hurt by the wide gap between CPU and memory speeds because these applications typically underutilize multi-...
John M. Mellor-Crummey, David B. Whalley, Ken Kenn...
Using off-the-shelf commodity workstations and PCs to build a cluster for parallel computing has become a common practice. A choice of a cost-effective cluster computing platform ...
We study the issue of performance prediction on the SGIPower Challenge, a typical SMP. On such a platform, the cost of memory accesses depends on their locality and on contention ...
Nancy M. Amato, Jack Perdue, Mark M. Mathis, Andre...
Application performance tuning is a complex process that requires assembling various types of information and correlating it with source code to pinpoint the causes of performance...
John M. Mellor-Crummey, Robert J. Fowler, David B....
The effectiveness of the memory hierarchy is critical for the performance of current processors. The performance of the memory hierarchy can be improved by means of program transf...