Multicast is an important collective operation for parallel programs. Some Network Interface Cards (NICs), such as Myrinet, have programmable processors that can be programmed to support multicast. This paper proposes a high performance and reliable NICbased multicast scheme, in which a NIC-based multisend mechanism is used to to send multiple replicas of a message to different destinations, and a NIC-based forwarding mechanism to forward the received packets without intermediate host involvement. We have explored different design alternatives and implemented the proposed scheme with the set of best alternatives over Myrinet/GM-2. MPICH-GM has also been modified to take advantage of this scheme. At the GM-level, the NICbased multicast improves the multicast latency by a fac
Weikuan Yu, Darius Buntinas, Dhabaleswar K. Panda