—Bulk memory copying and initialization is one of the most ubiquitous operations performed in current computer systems by both user applications and Operating Systems. While many...
Xiaowei Jiang, Yan Solihin, Li Zhao, Ravishankar I...
Bulk memory copies incur large overheads such as CPU stalling (i.e., no overlap of computation with memory copy operation), small register-size data movement, cache pollution, etc...
Karthikeyan Vaidyanathan, Lei Chai, Wei Huang, Dha...
This paper examines how the performance of a shared-memory multiprocessor can be improved by including hardware support for block transfers. A system similar to the Hector multipr...
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to addres...
Mark Horowitz, Margaret Martonosi, Todd C. Mowry, ...
Per-core local (scratchpad) memories allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architect...
Stamatis G. Kavadias, Manolis Katevenis, Michail Z...