Multicluster architectures overcome the scaling problem of centralized resources by distributing the datapath, register file, and memory subsystem across multiple clusters connected by a communication network. Traditional compiler partitioning algorithms focus solely on distributing operations across the clusters to maximize instruction-level parallelism. The distribution of data objects is generally ignored. In this work, we examine explicit partitioning of data objects and its affects on operation partitioning. The partitioning of data objects must consider several factors: object size, access frequency/pattern, and dependence patterns between operations that manipulate the objects. This work proposes a compiler-directed approach to synergistically partition both data objects and computation across multiple clusters. First, a global view of the application determines the interaction between data memory objects and their associated computation. Next, data objects are partitioned acr...
Michael L. Chu, Scott A. Mahlke