In cache based multiprocessors a protocol must maintain coherence among replicated copies of shared writable data. In delayed consistency protocols the effect of out-going and in-coming invalidations or updates are delayed. Delayed coherence can reduce processor blocking time as well as the effects of false sharing. In this paper, we introduce several implementations of delayed consistency for cache-based systems in the framework of a weaklyordered consistency model. A performance comparison of the delayed protocols with the corresponding On-the-Fly (non-delayed) consistency protocol is made, through execution-driven simulations of four parallel algorithms. The results show that, for parallel programs in which false sharing is a problem, significant reductions in the data miss rate of parallel programs can be obtained with just a small increase in the cost and complexity of the cache system.