We compare the number of cache misses M1 for running a computation on a single processor with cache size C1 to the total number of misses Mp for the same computation when using p processors or threads and a shared cache of size Cp. We show that for any computation, and with an appropriate
Guy E. Blelloch, Phillip B. Gibbons