Hybrid architectures combining the strengths of generalpurpose processors with application-specific hardware accelerators can lead to a significant performance improvement. Our hybrid architecture uses a Java Virtual Machine as an abstraction layer to hide the complexity of the hardware/software interface between processor and accelerator from the programmer. The data communication between the accelerator and the processor often incurs a significant cost, which sometimes annihilates the original speedup obtained by the accelerator. This article shows how we minimise this communication cost by dynamically chosing an optimal data layout in the Java heap memory which is distributed over both the accelerator and the processor memory. The proposed self-learning memory allocation strategy finds the optimal location for each Java object’s data by means of runtime profiling. The communication cost is effectively reduced by up to 86% for the benchmarks in the DaCapo suite (51% on averag...