Sciweavers

1053 search results - page 194 / 211
» Estimating Execution Time of Distributed Applications
Sort
View
99
Voted
SPAA
2009
ACM
16 years 3 months ago
Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures
Work stealing is a popular method of scheduling fine-grained parallel tasks. The performance of work stealing has been extensively studied, both theoretically and empirically, but...
Daniel Spoonhower, Guy E. Blelloch, Phillip B. Gib...
HPCA
2009
IEEE
16 years 3 months ago
PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches
As the last-level on-chip caches in chip-multiprocessors increase in size, the physical locality of on-chip data becomes important for delivering high performance. The non-uniform...
Mainak Chaudhuri
ICS
2009
Tsinghua U.
15 years 9 months ago
MPI-aware compiler optimizations for improving communication-computation overlap
Several existing compiler transformations can help improve communication-computation overlap in MPI applications. However, traditional compilers treat calls to the MPI library as ...
Anthony Danalis, Lori L. Pollock, D. Martin Swany,...
98
Voted
IPPS
2006
IEEE
15 years 8 months ago
Conjugate gradient sparse solvers: performance-power characteristics
We characterize the performance and power attributes of the conjugate gradient (CG) sparse solver which is widely used in scientific applications. We use cycle-accurate simulatio...
Konrad Malkowski, Ingyu Lee, Padma Raghavan, Mary ...
ICS
1995
Tsinghua U.
15 years 6 months ago
A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality
Current data cache organizations fail to deliver high performance in scalar processors for many vector applications. There are two main reasons for this loss of performance: the u...
Antonio González, Carlos Aliagas, Mateo Val...