The growing speed gap between memory and processor makes an efficient use of the cache ever more important to reach high performance. One of the most important ways to improve cac...
Adaptive mesh refinement (AMR) is a powerful technique that reduces the resources necessary to solve otherwise intractable problems in computational science. The AMR strategy solv...
Michael L. Welcome, Charles A. Rendleman, Leonid O...
The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As...
Samuel Williams, John Shalf, Leonid Oliker, Shoaib...
Traditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the ...
This paper presents our experience mapping OpenMP parallel programming model to the IBM Cyclops-64 (C64) architecture. The C64 employs a many-core-on-a-chip design that integrates...
The discrete wavelet transform (DWT) is used in several image and video compression standards, in particular JPEG2000. A 2D DWT consists of horizontal filtering along the rows fo...
Asadollah Shahbahrami, Ben H. H. Juurlink, Stamati...
There is a continuous research effort devoted to overcome the memory wall problem. Prefetching is one of the most frequently used techniques. A prefetch mechanism anticipates the ...