A Load/Store Unit for a Memcpy Hardware Accelerator

14 years 5 months ago

Download ce.et.tudelft.nl

Recently, a dedicated hardware accelerator was proposed that works in conjunction with caches found next to modern-day microprocessors, to speedup the commonly utilized memcpy operation. The main assumption of the proposal was that the to-be-memcpy-ed data has to reside inside the cache, which is not always valid. In this paper, we present a dedicated load/store unit and its implementation which cooperates with the previously proposed memcpy hardware accelerator and cache to ensure that data becomes available in the cache. Experimental results, using synthetic benchmarks, show that the load/store unit in conjunction with the memcpy hardware accelerator is capable of reducing the memcpy latencies by 85% (when the data is not present in the cache) compared to a highly optimized, hand-coded in assembly software solution.

Stamatis Vassiliadis, Filipa Duarte, Stephan Wong

Real-time Traffic

Dedicated Hardware Accelerator | FPL 2007 | Hardware | Hardware Accelerator | Memcpy Hardware Accelerator |

claim paper

Post Info
More Details (n/a)

Added	07 Jun 2010
Updated	07 Jun 2010
Type	Conference
Year	2007
Where	FPL
Authors	Stamatis Vassiliadis, Filipa Duarte, Stephan Wong

Comments (0)

Sciweavers

A Load/Store Unit for a Memcpy Hardware Accelerator

Dedicated Hardware Accelerator | FPL 2007 | Hardware | Hardware Accelerator | Memcpy Hardware Accelerator |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers