Sciweavers

287 search results - page 43 / 58
» On Exploiting Task Duplication in Parallel Program Schedulin...
Sort
View
SPAA
2012
ACM
11 years 10 months ago
A scalable framework for heterogeneous GPU-based clusters
GPU-based heterogeneous clusters continue to draw attention from vendors and HPC users due to their high energy efficiency and much improved single-node computational performance...
Fengguang Song, Jack Dongarra
PLDI
1998
ACM
13 years 11 months ago
The Implementation of the Cilk-5 Multithreaded Language
The fth release of the multithreaded language Cilk uses a provably good \work-stealing" scheduling algorithm similar to the rst system, but the language has been completely r...
Matteo Frigo, Charles E. Leiserson, Keith H. Randa...
IPPS
2009
IEEE
14 years 2 months ago
Minimizing startup costs for performance-critical threading
—Using the well-known ATLAS and LAPACK dense linear algebra libraries, we demonstrate that the parallel management overhead (PMO) can grow with problem size on even statically sc...
Anthony M. Castaldo, R. Clint Whaley
ECOOP
2010
Springer
14 years 9 days ago
Self-Replicating Objects for Multicore Platforms
The paper introduces Self-Replicating Objects (SROs), a new nt programming abstraction. An SRO is implemented and used much like an ordinary .NET object and can expose arbitrary us...
Krzysztof Ostrowski, Chuck Sakoda, Ken Birman
SC
1995
ACM
13 years 11 months ago
Communication Optimizations for Parallel Computing Using Data Access Information
Given the large communication overheads characteristic of modern parallel machines, optimizations that eliminate, hide or parallelize communication may improve the performance of ...
Martin C. Rinard