Synchronization operations, such as fence and locking, are used in many parallel operations accessing shared memory. However, a process which is blocked waiting for a fence operat...
Darius Buntinas, Amina Saify, Dhabaleswar K. Panda...
As bioinformatics is an emerging application of high performance computing, this paper first evaluates the memory performance of several representative bioinformatics application...
Guangming Tan, Lin Xu, Shengzhong Feng, Ninghui Su...
This paper investigates the design of parallel algorithmic strategies that address the efficient use of both, memory hierarchies within each processor and a multilevel clustered ...
Frank K. H. A. Dehne, Stefano Mardegan, Andrea Pie...
Compiler technology for multimedia extensions must effectively utilize not only the SIMD compute engines but also the various levels of the memory hierarchy: superword registers,...
Chun Chen, Jaewook Shin, Shiva Kintali, Jacqueline...
To keep up with a large degree of instruction level parallelism (ILP), the Itanium 2 cache systems use a complex organization scheme: load/store queues, banking and interleaving. ...
William Jalby, Christophe Lemuet, Sid Ahmed Ali To...