We show empirically that some of the issues that affected the design of linear algebra libraries for distributed memory architectures will also likely affect such libraries for s...
Bryan Marker, Field G. Van Zee, Kazushige Goto, Gr...
While trace cache, value prediction, and prefetching have been shown to be effective in the single-threaded superscalar, there has been no analysis of these techniques in a Simulta...
The Cell processor is a heterogeneous multi-core processor with one Power Processing Engine (PPE) core and eight Synergistic Processing Engine (SPE) cores. Each SPE has a directly...
Kevin O'Brien, Kathryn M. O'Brien, Zehra Sura, Ton...
The design and evaluation of microprocessor architectures is a difficult and time-consuming task. Although small, handcoded microbenchmarks can be used to accelerate performance e...
With the multitude of existing and upcoming wireless standards, it is becoming increasingly difficult for hardware-only baseband processing solutions to adapt to the rapidly chan...
Mark Woh, Yuan Lin, Sangwon Seo, Scott A. Mahlke, ...