Abstract. The ability to offload functionality to a programmable network interface is appealing, both for increasing message passing performance and for reducing the overhead on the host processor(s). Two important features of an MPI implementation are independent progress and the ability to overlap computation with communication. In this paper, we compare the performance of several application benchmarks using an MPI implementation that takes advantage of a programmable NIC to implement MPI semantics with an implementation that does not. Unlike previous such comparisons, we compare identical network hardware using virtually the same software stack. This comparison isolates these two important features of an MPI implementation.
Ron Brightwell, Keith D. Underwood, Rolf Riesen