Bulk data movement occurs commonly in server workloads and their performance is rather poor on today’s microprocessors. We propose the use of small dedicated copy engines, and present a detailed analysis of bulk data copy engine architectures. We describe the hardware support required to implement the copy engine and to tightly integrate it into server platforms. Our evaluation is based on an execution driven simulator that was extended with detailed models of bulk data movement engines. The simulation results show that dedicated engines are quite effective in eliminating the data movement overhead and are an attractive choice for handling bulk data in future high performance server platforms.
Li Zhao, Ravi R. Iyer, Srihari Makineni, Laxmi N.