MapReduce has been prevalent for running data-parallel applications. By hiding other non-functionality parts such as parallelism, fault tolerance and load balance from programmers,...
Shengkai Zhu, Zhiwei Xiao, Haibo Chen, Rong Chen, ...
Widespread adaptation of shared memory programming for High Performance Computing has been inhibited by a lack of standardization and the resulting portability problems between pl...
All-to-all broadcast is one of the common collective operations that involve dense communication between all processes in a parallel program. Previously, programmable Network Inte...
FPGA (Field Programmable Gate Array) based reconfigurable processor has been shown to meet the increasingly challenging performance targets and shorter time-to-market pressures. I...
With the ever-increasing pervasiveness of the Internet and its stringent performance requirements, network system designers have begun utilizing specialized chips to increase the ...