—Starting from sequential programs, we present an approach combining data reuse, multi-level MapReduce, and pipelining to automatically find the most power-efficient designs that meet speed and area constraints in the design space on FieldProgrammable Gate Arrays (FPGAs). This combined approach enables trade-offs in power, speed and area: we show 63% reduction in power can be achieved with 27% increase in execution time. Compared to the sequential designs, our approach yields designs with up to 158 times reduction in execution time. Moreover, for a given execution time, our combined approach