Commercial soft processors are unable to effectively exploit the data parallelism present in many embedded systems workloads, requiring FPGA designers to exploit it (laboriously) with manual hardware design. Recent research [1, 2] has demonstrated that soft processors augmented with support for vector instructions provide significant improvements in performance and scalability for dataparallel workloads. These soft vector processors provide a software environment for quickly encoding data parallel computation, but their competitiveness with manual hardware design in terms of area and performance remains unknown. In this work, using an FPGA platform equipped with DDR memory executing data-parallel EEMBC embedded benchmarks, we measure the area/performance gaps between (i) a scalar soft processor, (ii) our improved soft vector processor, and (iii) custom FPGA hardware. We demonstrate that the 432x wall clock performance gap between scalar executed C and custom hardware can be reduced s...
Peter Yiannacouras, J. Gregory Steffan, Jonathan R