Current on-chip block-centric memory hierarchies exploit access patterns at the fine-grain scale of small blocks. Several recently proposed memory hierarchy enhancements for coher...
Recent research advocates memory streaming techniques to alleviate the performance bottleneck caused by the high latencies of off-chip memory accesses. Temporal memory streaming r...
Stephen Somogyi, Thomas F. Wenisch, Anastasia Aila...
Despite a large research effort, software distributed shared memory systems have not been widely used to run parallel applications across clusters of computers. The higher perform...
Current on-chip block-centric memory hierarchies exploit access patterns at the fine-grain scale of small blocks. Several recently proposed techniques for coherence traffic reduct...
In this paper we present an analytical-based framework for parallel program performance prediction. The main thrust of this work is to provide a means for treating realistic appli...