Sciweavers

164 search results - page 10 / 33
» Data distribution for dense factorization on computers with ...
Sort
View
ICPPW
2009
IEEE
13 years 5 months ago
Characterizing the Performance of
Using Linux for high-performance applications on the compute nodes of IBM Blue Gene/P is challenging because of TLB misses and difficulties with programming the network DMA engine...
Kazutomo Yoshii, Kamil Iskra, Harish Naik, Pete Be...
TSP
2010
13 years 2 months ago
Optimization and analysis of distributed averaging with short node memory
Distributed averaging describes a class of network algorithms for the decentralized computation of aggregate statistics. Initially, each node has a scalar data value, and the goal...
Boris N. Oreshkin, Mark Coates, Michael G. Rabbat
PPOPP
1997
ACM
13 years 10 months ago
Shared Memory Performance Profiling
This paper describes a new approach to finding performance bottlenecks in shared-memory parallel programs and its embodiment in the Paradyn Parallel Performance Tools running with...
Zhichen Xu, James R. Larus, Barton P. Miller
ICS
1999
Tsinghua U.
13 years 11 months ago
Improving memory hierarchy performance for irregular applications
The performance of irregular applications on modern computer systems is hurt by the wide gap between CPU and memory speeds because these applications typically underutilize multi-...
John M. Mellor-Crummey, David B. Whalley, Ken Kenn...
IPPS
1998
IEEE
13 years 11 months ago
High Performance Linear Algebra Package LAPACK90
Abstract. LAPACK90 is a set of LAPACK90 subroutines which interfaces FORTRAN90 with LAPACK. All LAPACK driver subroutines including expert drivers and some LAPACK computationals ha...
Jack Dongarra, Jerzy Wasniewski