This paper presents a multiprocessor system on FPGA that adopts Direct Memory Access (DMA) mechanisms to move data between the external memory and the local memory of each processor. The system integrates all standard DMA primitives via a fast Application Programming Interface (API) and relies on interrupts having also the possibility to manage a command list. This interface allows to program the embedded multiprocessor architecture on FPGA with simple DMAs using the same DMA techniques adopted on high performance multiprocessors with complex DMA controllers. Several experiments demonstrate the performance of our solution, allowing 57% improvement on the execution time of a selected set of benchmarks. We furthermore show how some DMA programming techniques (double and multi-buffering) can be effectively used within our platform, thus easing the design and development of the hardware and the software in a reconfigurable DMA-based environment.1