In this paper we examine how application performance scales on a state-of-the-art shared virtual memory (SVM) system on a cluster with 64 processors, comprising 4-way SMPs connected with a fast system area network. The protocol we use is home-based and takes advantage of general-purpose data movement and mutual exclusion support provided by a programmable network interface. We find that while the level of application restructuring needed is quite high compared to applications that perform well on a hardware-coherent system of this scale, and larger problem sizes are needed for good performance, SVM, surprisingly, performs quite well at the 64-processor scale for a fairly wide range of applications, achieving at least half the parallel efficiency of a high-end hardware-coherent system and often much more. We explore further application restructurings than those developed earlier for smaller-scale SVM systems, examine the main remaining system and application bottlenecks, and point out...