In this paper we present a scalable dataflow hard- ware architecture optimized for the computation of general- purpose vision algorithms—neuFlow—and a dataflow compiler—luaFlow—that transforms high-level flow-graph representations of these algorithms into machine code for neuFlow. This system was designed with the goal of pro- viding real-time detection, categorization and localization of objects in complex scenes, while consuming 10 Watts when implemented on a Xilinx Virtex 6 FPGA platform, or about ten times less than a laptop computer, and producing speedups of up to 100 times in real-world applications. We present an application of the system on street scene anal- ysis, segmenting 20 categories on 500 × 375 frames at 12 frames per second on our custom hardware neuFlow.
C. Farabet, B. Martini, B. Corda, P. Akselrod, E.