Snoopy cache coherence protocols broadcast requests to all nodes, reducing the latency of cache to cache transfer misses at the expense of increasing interconnect power. We propos...
Run-time parallelization is often the only way to execute the code in parallel when data dependence information is incomplete at compile time. This situation is common in many imp...
Our recent work on uniprocessor and single-node multiprocessor (SMP) active memory systems uses address remapping techniques in conjunction with extended cache coherence protocols...
Abstract—Sharing patterns in shared-memory multiprocessors are the key to performance: uniprocessor latencytolerating techniques such as out-of-order execution and non-blocking c...
We are currently developing Willow, a shared-memory multiprocessor whose design provides system capacity and performance capable of supporting over a thousand commercial microproc...
John K. Bennett, Sandhya Dwarkadas, Jay A. Greenwo...