The map-reduce model requires users to express their problem in terms of a map function that processes single records in a stream, and a reduce function that merges all mapped outputs to produce a final result. By exposing structural similarity in this way, a number of key issues associated with the design of custom computing machines including parallelisation; design complexity; software-hardware partitioning; hardware-dependency, portability and scalability can be easily addressed. We present an implementation of a map-reduce library supporting parallel field programmable gate arrays (FPGAs) and graphics processing units (GPUs). Parallelisation due to pipelining, multiple datapaths and concurrent execution of FPGA/GPU hardware is automatically achieved. Users first specify the map and reduce steps for the problem in ANSI C and no knowledge of the underlying hardware or parallelisation is needed. The source code is then manually translated into a pipelined datapath which, along wi...
Jackson H. C. Yeung, C. C. Tsang, Kuen Hung Tsoi,