Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

32

CONCURRENCY
1998

favoriteEmaildiscussreport

151views more CONCURRENCY 1998»

A new parallel matrix multiplication algorithm on distributed-memory concurrent computers

14 years 4 days ago

A new parallel matrix multiplication algorithm on distributed-memory concurrent computers

Download www.netlib.org

We present a new fast and scalable matrix multiplication algorithm, called DIMMA Distribution-Independent Matrix Multiplication Algorithm, for block cyclic data distribution on distributed-memory concurrentcomputers. The algorithm is based on two new ideas; it uses a modi ed pipelined communication scheme to overlap computation and communication e ectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor even when the block size is very small as well as very large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer.

Jaeyoung Choi

Real-time Traffic

Block Cyclic Data | CONCURRENCY 1998 | Matrix Multiplication | Scalable Matrix Multiplication |

claim paper

Post Info
More Details (n/a)

Added	22 Dec 2010
Updated	22 Dec 2010
Type	Journal
Year	1998
Where	CONCURRENCY
Authors	Jaeyoung Choi

Comments (0)