All-to-all broadcast is one of the common collective operations that involve dense communication between all processes in a parallel program. Previously, programmable Network Interface Cards (NICs) have been leveraged to efficiently support collective operations, including barrier, broadcast, and reduce. This paper explores the characteristics of all-to-all broadcast and proposes new algorithms to exploit the potential advantages of NIC programmablity. Along with these algorithms, salient strategies have been used to provide scalable topology management, global buffer management, efficient communication processing, and message reliability. The algorithms have been incorporated into a NIC-based collective protocol over Myrinet/GM. The NIC-based all-to-all broadcast operations improve allto-all broadcast bandwidth over 16 nodes by a factor of 3, compared to host-based all-to-all broadcast operation. Furthermore, the NIC-based operations have been demonstrated to achieve better scalabili...
Weikuan Yu, Dhabaleswar K. Panda, Darius Buntinas