Ever-increasing memory footprint of applications and increasing mainstream popularity of shared memory parallel computing motivate us to explore memory compression potential in distributed shared memory (DSM) multiprocessors. This paper for the first time integrates on-the-fly cache block compression/decompression algorithms in the cache coherence protocols by leveraging the directory structure already present in these scalable machines. Our proposal is unique in the sense that instead of employing custom compression/decompression hardware, we use a simple on-die protocol processing core in dual-core nodes for running our directory-based coherence protocol suitably extended with compression/decompression algorithms. We design a lowoverhead compression scheme based on frequent patterns and zero runs present in the evicted dirty L2 cache blocks. Our compression algorithm examines the first eight bytes of an evicted dirty L2 block arriving at the home memory controller and speculates ...