Sciweavers

656 search results - page 127 / 132
» Scalable Parallel Matrix Multiplication on Distributed Memor...
Sort
View
IPPS
2010
IEEE
13 years 6 months ago
KRASH: Reproducible CPU load generation on many-core machines
Abstract--In this article we present KRASH, a tool for reproducible generation of system-level CPU load. This tool is intended for use in shared memory machines equipped with multi...
Swann Perarnau, Guillaume Huard
PVM
2007
Springer
14 years 2 months ago
Revealing the Performance of MPI RMA Implementations
The MPI remote-memory access (RMA) operations provide a different programming model from the regular MPI-1 point-to-point operations. This model is particularly appropriate for ca...
William D. Gropp, Rajeev Thakur
ICPP
2006
IEEE
14 years 2 months ago
Data Transfers between Processes in an SMP System: Performance Study and Application to MPI
— This paper focuses on the transfer of large data in SMP systems. Achieving good performance for intranode communication is critical for developing an efficient communication s...
Darius Buntinas, Guillaume Mercier, William Gropp
CW
2002
IEEE
14 years 1 months ago
Efficient Data Compression Methods for Multi-Dimensional Sparse Array Operations
For sparse array operations, in general, the sparse arrays are compressed by some data compression schemes in order to obtain better performance. The Compressed Row/Column Storage...
Chun-Yuan Lin, Yeh-Ching Chung, Jen-Shiuh Liu
HPCA
2009
IEEE
14 years 9 months ago
Express Cube Topologies for on-Chip Interconnects
Driven by continuing scaling of Moore's law, chip multiprocessors and systems-on-a-chip are expected to grow the core count from dozens today to hundreds in the near future. ...
Boris Grot, Joel Hestness, Stephen W. Keckler, Onu...