The creation and optimization of FPGA accelerators comprising several compute cores and memories are challenging tasks in high performance reconfigurable computing. In this paper, we present the design of such an accelerator for the kth nearest neighbor thinning problem on an XD1000 reconfigurable computing system. The design leverages IMORC, an architectural template and highly versatile on-chip interconnect, to achieve speedups of 74× over a 2.2GHz Opteron. Using IMORC with its asynchronous FIFOs and bitwidth conversion in the links between the cores, we are able to quickly create acclerator versions with varying degrees of core-level parallelism and memory mappings. Through the performance monitoring infrastructure of IMORC we gain insight into the data-dependent behavior of the accelerator which facilitates further performance optimizations.