Historically, processor accesses to memory-mapped device registers have been marked uncachable to insure their visibility to the device. The ubiquity of snooping cache coherence, however, makes it possible for processors and devices to interact with cachable, coherent memory operations. Using coherence can improve performance by facilitating burst transfers of whole cache blocks and reducing control overheads (e.g., for polling). This paper begins an exploration of network interfaces (NIs) that use coherence--coherent network interfaces (CNIs)--to improve communication performance. We restrict this study to NI/ CNIs that reside on coherent memory or I/O buses, to NI/CNIs that are much simpler than processors, and to the performance of finegrain messaging from user process to user process. Our first contribution is to develop and optimize two mechanisms that CNIs use to communicate with processors. A cachable device register--derived from cachable control registers [39,40]-is a coheren...
Shubhendu S. Mukherjee, Babak Falsafi, Mark D. Hil