Data movement operations, such as the C-style memcpy function, are often used to duplicate or communicate data. This type of function typically produces a significant amount of off-chip traffic. For current microprocessors, communication with off-chip memory is an increasing limitation to attain higher performance as well as a significant source of energy consumption. To decrease the amount of communication between a CPU and the off-chip memory system, we propose a system that implements a hardware memcpy in the memory level where the source data is located.
Pepijn J. de Langen, Ben H. H. Juurlink