This paper discusses both the theoretical and statistical errors obtained by various well-known dot products, from the canonical to pairwise algorithms, and introduces a new and more general framework that we have named superblock which subsumes them and permits a practitioner to make trade-offs between computational performance, memory usage, and error behavior. We show that algorithms with lower error bounds tend to behave noticeably better in practice. Unlike many such error-reducing algorithms, superblock requires no additional floating point operations and should be implementable with little to no performance loss, making it suitable for use as a performance-critical building block of a linear algebra kernel. Key words. dot product, inner product, error analysis, BLAS, ATLAS AMS subject classifications. 65G50, 65K05, 65K10, 65Y20, 68-04 DOI. 10.1137/070679946
Anthony M. Castaldo, R. Clint Whaley, Anthony T. C