The known fast sequential algorithms for multiplying two N N matrices (over an arbitrary ring) have time complexity ON , where 2 3. The current best value of is less than 2.3755. We show that for all 1 p N , multiplying two N N matrices can be performed on a p-processor linear array with a reconfigurable pipelined bus system (LARPBS) in ON =p + N2 =p2= logp time. This is currently the fastest parallelization of the best known sequential matrix multiplication algorithm on a distributed memory parallel system.
Keqin Li, Victor Y. Pan