As compared to a complex single processor based system, on-chip multiprocessors are less complex, more power efficient, and easier to test and validate. In this work, we focus on an on-chip multiprocessor where each processor has a local memory (or cache). We demonstrate that, in such an architecture, allowing each processor to do off-chip memory requests on behalf of other processors can improve overall performance over a straightforward strategy, where each processor performs off-chip requests independently. Our experimental results obtained using six benchmark codes indicate large execution cycle savings over a wide range of architectural configurations.
Guangyu Chen, Mahmut T. Kandemir, Alok N. Choudhar