We propose an execution model that orchestrates the fine-grained interaction of a conventional general-purpose processor (GPP) and a high-speed reconfigurable hardware accelerator (HA), the latter having full master-mode access to memory. We then describe how the resulting requirements can actually be realized efficiently in a custom computer by hardware architecture and system software measures. One of these is a low-latency HA-to-GPP signaling scheme with latency up to 23x times shorter than conventional approaches. Another one is a high-bandwidth shared memory interface that does not interfere with time-critical operating system functions executing on the GPP, and still makes 89% of the physical memory bandwidth available to the HA. Finally, we show two schemes with different flexibility / performance trade-offs for running the HA in protected virtual memory scenarios. All of the techniques and their interactions are evaluated at the system level using the full-scale virtual memory ...