We describe a software solution to the problem of automatic parallelization of linear algebra code on multi-processor and multi-core architectures. This solution relies on the defi...
The efforts of an expert to parallelize and optimize a dense linear algebra algorithm for distributed-memory targets are largely mechanical and repetitive. We demonstrate that the...
Bryan Marker, Andy Terrel, Jack Poulson, Don S. Ba...
It is our belief that the ultimate automatic system for deriving linear algebra libraries should be able to generate a set of algorithms starting from the mathematical specificati...
Paolo Bientinesi, Sergey Kolos, Robert A. van de G...
Strassen’s matrix multiplication (MM) has benefits with respect to any (highly tuned) implementations of MM because Strassen’s reduces the total number of operations. Strasse...