This paper addresses a purely software-based solution to the multiprocessor cache coherence problem by structuring an operating system to provide for the coherence of its own data while exporting coherent memory to user processes. Also covered are the results of a proof-of-concept port of Mach 3.0, using the principles in this paper, to a prototype of the IBM Shared Memory System POWER14TM, a Shared Memory Cluster. This is believed to be the first implementation of a commercial operating system on a non-cache coherent machine and required the development of a software technique to detect coherence violations. Benchmark results show that on the four CPU system this solution provides a throughput increase of up to 3.9 times that of a single processor.
Ronald L. Rockhold, James L. Peterson