Abstract. We present PerfMiner, a system for the transparent collection, storage and presentation of thread-level hardware performance data across an entire cluster. Every sub-proc...
Philip Mucci, Daniel Ahlin, Johan Danielsson, Per ...
Continuous media playback suffers when a station's operating system offers insufficient 1/0 throughput. Conventional 1/0 system structures support a memory-oriented read and ...
Numerical applications frequently contain nested loop structures that process large arrays of data. The execution of these loop structures often produces memory preference pattern...
Yoji Yamada, John Gyllenhall, Grant Haab, Wen-mei ...
General-purpose microprocessors augmented with SIMD execution units enhance multimedia applications by exploiting data level parallelism. However, supporting/overhead related inst...
— One of the critical goals in code optimization for MPSoC architectures is to minimize the number of off-chip memory accesses. This is because such accesses can be extremely cos...