Using FPGAs to accelerate High Performance Computing (HPC) applications is attractive, but has a huge associated cost: the time spent, not for developing efficient FPGA code but for handling interfaces between CPUs and FPGAs. The usual difficulties are the discovery of interface libraries and tools, and the selection of methods to debug and optimize the communications. Our GALS (Globally Asynchronous Locally Synchronous) system design framework, which was originally designed for embedded systems, happens to be outstanding for programming and debugging HPC systems with reconfigurable FPGAs. Its co-simulation capabilities and the automatic regeneration of interfaces allow an incremental design strategy in which the HPC programmer co-designs both software and hardware on the host. It then the flexibility to move components from software abstraction to Verilog/VHDL simulator, and eventually to FPGA targets with automatic generation of asynchronous interfaces. The whole design including the...