Sciweavers

155 search results - page 29 / 31
» On the Automatic Parallelization of the Perfect Benchmarks
Sort
View
HPDC
2012
IEEE
11 years 10 months ago
Dynamic adaptive virtual core mapping to improve power, energy, and performance in multi-socket multicores
Consider a multithreaded parallel application running inside a multicore virtual machine context that is itself hosted on a multi-socket multicore physical machine. How should the...
Chang Bae, Lei Xia, Peter A. Dinda, John R. Lange
HPCA
2005
IEEE
14 years 8 months ago
Unbounded Transactional Memory
Hardware transactional memory should support unbounded transactions: transactions of arbitrary size and duration. We describe a hardware implementation of unbounded transactional ...
C. Scott Ananian, Krste Asanovic, Bradley C. Kuszm...
ASPLOS
2009
ACM
14 years 8 months ago
Dynamic prediction of collection yield for managed runtimes
The growth in complexity of modern systems makes it increasingly difficult to extract high-performance. The software stacks for such systems typically consist of multiple layers a...
Michal Wegiel, Chandra Krintz
ICS
2009
Tsinghua U.
14 years 2 months ago
MPI-aware compiler optimizations for improving communication-computation overlap
Several existing compiler transformations can help improve communication-computation overlap in MPI applications. However, traditional compilers treat calls to the MPI library as ...
Anthony Danalis, Lori L. Pollock, D. Martin Swany,...
LCTRTS
2005
Springer
14 years 1 months ago
Cache aware optimization of stream programs
Effective use of the memory hierarchy is critical for achieving high performance on embedded systems. We focus on the class of streaming applications, which is increasingly preval...
Janis Sermulins, William Thies, Rodric M. Rabbah, ...