The traditional approach to the parallelization of linear algebra algorithms such as matrix multiplication and LU factorization calls for static allocation of matrix blocks to processing elements (PEs). Such algorithms suffer from two drawbacks : they are very sensitive to load imbalances between PEs and they make it difficult to take advantage of pipelining opportunities. This paper describes dynamic versions of linear algebra algorithms, where subtasks (matrix block multiplication, matrix block LU factorization) are dynamically allocated to PEs. It analyses theoretically the performance of the dynamic algorithms. This paper’s contribution is to show that the dynamicpipelined linear-algebra algorithms can be specified compactly in CAP and yet achieve good performance. CAP is a C++ language extension for the specification of parallel applications based on macro-dataflow graphs. The CAP model, based on macro-dataflow graphs, is general and supports pipelining.
Marc Mazzariol, Benoit A. Gennart, Vincent Messerl