Software distributed shared memory (DSM) improves the programmability of message-passing machines and workclusters by providing a shared memory abstract (i.e., a coherent global a...
By optimizing data layout at run-time, we can potentially enhance the performance of caches by actively creating spatial locality, facilitating prefetching, and avoiding cache con...
Hardware signatures have been recently proposed as an efficient mechanism to detect conflicts amongst concurrently running transactions in transactional memory systems (e.g., Bulk...
The performance of both serial and parallel implementations of matrix multiplication is highly sensitive to memory system behavior. False sharing and cache conflicts cause traditi...
Siddhartha Chatterjee, Alvin R. Lebeck, Praveen K....
Software-controlled cache prefetching and data forwarding are widely used techniques for tolerating memory latency in shared memory multiprocessors. However, some previous studies...