Data prefetching has been widely used in the past as a technique for hiding memory access latencies. However, data prefetching in multi-threaded applications running on chip multi...
Dhruva Chakrabarti, Mahmut T. Kandemir, Mustafa Ka...
Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies fr...
—In this paper, we present a design architecture of implementing a ”Vector Bank” into video encoder system, namely, an H.264 encoder, in order to detect and analyze the movin...
This paper describes improvements to the Mach microkernel’s support for efficient application startup across multiple nodes in a cluster or massively parallel processor. Signifi...
Dejan S. Milojicic, David L. Black, Steven J. Sear...
MATLAB is one of the most popular languages for desktop numerical computations as well as for signal and image processing applic ations. Applying parallel processing techniques to...