Abstract. An MPI library, called MPICH-PM/CLUMP, has been implemented on a cluster of SMPs. MPICH-PM/CLUMP realizes zero copy message passing between nodes while using one copy message passing within a node to achieve high performance communication. To realize one copy message passing on an SMP, a kernel primitive has been designed which enables a process to read the data of another process. The get protocol using this primitive was added to MPICH. MPICH-PM/CLUMP has been run on an SMP cluster consisting of 64 Pentium II dual processors and Myrinet. It achieves 98 MByte/sec between nodes and 100 MBytes/sec within a node.