Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers

15 years 5 months ago

Download cseweb.ucsd.edu

We examine the ability of CMPs, due to their lower onchip communication latencies, to exploit data parallelism at inner-loop granularities similar to that commonly targeted by vector machines. Parallelizing code in this manner leads to a high frequency of barriers, and we explore the impact of different barrier mechanisms upon the efficiency of this approach. To further exploit the potential of CMPs for fine-grained data parallel tasks, we present barrier filters, a mechanism for fast barrier synchronization on chip multi-processors to enable vector computations to be efficiently distributed across the cores of a CMP. We ensure that all threads arriving at a barrier require an unavailable cache line to proceed, and, by placing additional hardware in the shared portions of the memory subsytem, we starve their requests until they all have arrived. Specifically, our approach uses invalidation requests to both make cache lines unavailable and identify when a thread has reached the barrier...

Jack Sampson, Rubén González, Jean-F

Real-time Traffic

Barrier Filters | Cache Lines | Hardware | MICRO 2006 | Onchip Communication Latencies |

claim paper

» Enhancing L2 organization for CMPs with a center cell

» Design and evaluation of a hierarchical onchip interconnect for nextgeneration CMPs

Post Info
More Details (n/a)

Added	14 Dec 2010
Updated	14 Dec 2010
Type	Journal
Year	2006
Where	MICRO
Authors	Jack Sampson, Rubén González, Jean-Francois Collard, Norman P. Jouppi, Michael S. Schlansker, Brad Calder

Comments (0)

Sciweavers

Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers

Barrier Filters | Cache Lines | Hardware | MICRO 2006 | Onchip Communication Latencies |

Explore & Download

Productivity Tools

Sciweavers