This work details a performance study of six different types of commodity memories in two commodity server nodes. A number of micro-benchmarks are used that measure low-level perf...
Darren J. Kerbyson, Michael Lang 0003, Gene Patino...
Irregular applications frequently exhibit poor performance on contemporary computer architectures, in large part because of their inefficient use of the memory hierarchy. Runtime ...
As superscalar processors become increasingly wide, it is inevitable that the large set of instructions to be fetched every cycle will span multiple noncontiguous basic blocks. Th...
Instruction combining is an optimization to replace a sequence of instructions with a more efficient instruction yielding the same result in a fewer machine cycles. When we use it...
Feedback-directed optimization has become an increasingly important tool in designing and building optimizing compilers. Recently, reuse-distance analysis has shown much promise i...