This paper presents a novel networking architecture designed for communication intensive parallel applications running on clusters of workstations (COWs) connected by highspeed networks. This architecture permits (1)the transfer of selected communication-relatedfunctionalityfrom the host machine to the network interface coprocessor, and (2) the exposure of this functionalitydirectly to applications as instructions of aVirtual Communication Machine (VCM) implemented by the coprocessor. The user-level code interacts directly withthe network coprocessor as the host kernel only ’connects’ the application to the VCM and does not participate in the data transfers. The distinctive feature of our design is its flexibility: the integration of the network withthe applicationcan be varied tomaximize performance. The resulting communication architecture is characterized by a very low overhead on the host processor, by latency and bandwidth close to the hardware limits, and by an application ...