On large distributed memory parallel computers the global communication cost of inner products seriously limits the performance of Krylov subspace methods 3]. We consider improved algorithms to reduce this communication overhead, and we analyze the performance by experiments on a 400-processor parallel computer and with a simple performance model.
Eric de Sturler, Henk A. van der Vorst