We present a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and...
Advances in multiprocessor interconnect technologyare leading to high performance networks. However, software overheadsassociated with message passing are limiting the processors ...
Debashis Basak, Dhabaleswar K. Panda, Mohammad Ban...
In large-scale parallel applications a graph coloring is often carried out to schedule computational tasks. In this paper, we describe a new distributedmemory algorithm for doing t...
This paper describes the design and implementation on MIMD parallel machines of P-AutoClass, a parallel version of the AutoClass system based upon the Bayesian method for determini...
This paper explores the scalability of the Stream Processor architecture along the instruction-, data-, and thread-level parallelism dimensions. We develop detailed VLSI-cost and ...