Low-latency remote-write networks, such as DEC’s Memory Channel, provide the possibility of transparent, inexpensive, large-scale shared-memory parallel computing on clusters of shared memory multiprocessors (SMPs). The challenge is to take advantage of hardwareshared memory for sharing within an SMP, and to ensure that software overheadis incurred only when actively sharing data across SMPs in the cluster. In this paper, we describe a “twolevel” software coherent shared memory system—Cashmere-2L— that meets this challenge. Cashmere-2L uses hardware to share memory within a node, while exploiting the Memory Channel’s remote-write capabilities to implement “moderately lazy” release consistency with multiple concurrent writers, directories, home nodes, and page-size coherence blocks across nodes. Cashmere2L employs a novel coherence protocol that allows a high level of asynchrony by eliminating global directory locks and the need for TLB shootdown. Remote interrupts are ...