Region-based compilation repartitions a program into more desirable compilation units for optimization and scheduling, particularly beneficial for ILP architectures. With region-...
Multimedia instruction set extensions have become a prominent feature in desktop microprocessor platforms, promising superior performance on a wide range of floating-point and int...
Branch prediction accuracy is a very important factor for superscalarprocessor performance. The ability topredict the outcome of a branch allows the processor to effectively use a...
Recent research suggests that DSM clusters can benefit from parallel coherence controllers. Parallel controllers require address partitioning and synchronization to avoid handlin...
ÐThis paper presents a multithreaded abstract machine for the TyCO process calculus. We argue that process calculi provide a powerful framework to reason about fine-grained parall...
As on-chip integration matures, single-chip system designers must not only be concerned with component-level issues such as performance and power, but also with onchip system-leve...
Recent digital signal processors (DSPs) show a homogeneous VLIW-like data path architecture, which allows C compilers to generate efficient code. However, still some special rest...
In this paper, we look at two issues which could affect the performance of value prediction on wide-issue ILP processors. One is the large number of accesses to the value predicti...
A multiprocessor prefetch scheme is described in which a miss is followed by a prefetch of a group of lines, a neighborhood, surrounding the demand-fetched line. The neighborhood ...
The performance potential of a value reuse mechanism depends on its reuse detection time, the number of reuse opportunities, and the amount of work saved by skipping each reuse un...