Sciweavers

779 search results - page 147 / 156
» A Simple Program Transformation for Parallelism
Sort
View
ASPLOS
2010
ACM
14 years 2 months ago
Flexible architectural support for fine-grain scheduling
To make efficient use of CMPs with tens to hundreds of cores, it is often necessary to exploit fine-grain parallelism. However, managing tasks of a few thousand instructions is ...
Daniel Sanchez, Richard M. Yoo, Christos Kozyrakis
PLDI
2005
ACM
14 years 1 months ago
Register allocation for software pipelined multi-dimensional loops
Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level paral...
Hongbo Rong, Alban Douillet, Guang R. Gao
ISCAPDCS
2001
13 years 9 months ago
End-user Tools for Application Performance Analysis Using Hardware Counters
One purpose of the end-user tools described in this paper is to give users a graphical representation of performance information that has been gathered by instrumenting an applica...
Kevin S. London, Jack Dongarra, Shirley Moore, Phi...
JSA
2000
116views more  JSA 2000»
13 years 7 months ago
Distributed vector architectures
Integrating processors and main memory is a promising approach to increase system performance. Such integration provides very high memory bandwidth that can be exploited efficientl...
Stefanos Kaxiras
PPL
2011
12 years 10 months ago
Mpi on millions of Cores
Petascale parallel computers with more than a million processing cores are expected to be available in a couple of years. Although MPI is the dominant programming interface today ...
Pavan Balaji, Darius Buntinas, David Goodell, Will...