This site uses cookies to deliver our services and to ensure you get the best experience. By continuing to use this site, you consent to our use of cookies and acknowledge that you have read and understand our Privacy Policy, Cookie Policy, and Terms
We present a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and...
Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity ON , where 2 3. We show that such an algorithm can be parallelize...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA Distribution-Independent Matrix Multiplication Algorithm, for block cyclic data distribution on ...
We present new communication-efficient parallel dense linear solvers: a solver for triangular linear systems with multiple right-hand sides and an LU factorization algorithm. Thes...
In modern clustering environments where the memory hierarchy has many layers (distributed memory, shared memory layer, cache, ¡ ¢ ), an important question is how to fully u...