Advances in network technology continue to improve the communication performance of workstation and PC clusters, making high-performance workstation-clustercomputing increasingly viable. These hardware advances, however, are taxingtraditionalhost-softwarenetwork protocols to the breaking point. A modern gigabit network can swamp a host's IO bus and processor, limitingcommunication performance and slowingcomputationunacceptably. Fortunately, host-programmable network processors used by these networks present a potential solution. Offloading selected host processing to these embedded network processors lowers host overhead and improves latency. This paper examines the use of embedded network processors to improve the performance of workstation-cluster global memory management. We have implemented a revised version of the GMS global memory system that eliminates host overhead by as much as 29% on active nodes and improves page fault latency by as much as 39%.
Yvonne Coady, Joon Suan Ong, Michael J. Feeley