Sciweavers

481 search results - page 28 / 97
» Performance Modeling and Measurement of Parallelized Code fo...
Sort
View
ICS
2007
Tsinghua U.
14 years 1 months ago
Scheduling FFT computation on SMP and multicore systems
Increased complexity of memory systems to ameliorate the gap between the speed of processors and memory has made it increasingly harder for compilers to optimize an arbitrary code...
Ayaz Ali, S. Lennart Johnsson, Jaspal Subhlok
IPPS
1999
IEEE
13 years 12 months ago
Experimental Evaluation of QSM, a Simple Shared-Memory Model
Parallel programming models should attempt to satisfy two conflicting goals. On one hand, they should hide architectural details so that algorithm designers can write simple, port...
Brian Grayson, Michael Dahlin, Vijaya Ramachandran
ISCA
1997
IEEE
90views Hardware» more  ISCA 1997»
13 years 11 months ago
The Interaction of Software Prefetching with ILP Processors in Shared-Memory Systems
Current microprocessors aggressively exploit instructionlevel parallelism (ILP) through techniques such as multiple issue, dynamic scheduling, and non-blocking reads. Recent work ...
Parthasarathy Ranganathan, Vijay S. Pai, Hazim Abd...
MICRO
2006
IEEE
113views Hardware» more  MICRO 2006»
13 years 7 months ago
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers
We examine the ability of CMPs, due to their lower onchip communication latencies, to exploit data parallelism at inner-loop granularities similar to that commonly targeted by vec...
Jack Sampson, Rubén González, Jean-F...
HPCA
1999
IEEE
13 years 12 months ago
Limits to the Performance of Software Shared Memory: A Layered Approach
Much research has been done in fast communication on clusters and in protocols for supporting software shared memory across them. However, the end performance of applications that...
Angelos Bilas, Dongming Jiang, Yuanyuan Zhou, Jasw...