We present experimental evidence that multiple compute-units, compiled from sequential high-level language input programs, can be merged into a reduced number of configurations f...
Computer architects are constantly faced with the need to improve performance and increase the efficiency of computation in their designs. To this end, it is increasingly common ...
Nathan Clark, Amir Hormati, Scott A. Mahlke, Sami ...
—We investigate the problem of memory reuse in order to reduce the memory needed to store an array variable. We develop techniques that can lead to smaller memory requirements in...
The Single Instruction Multiple Data (SIMD) model for fine-grained parallelism was recently extended to support SIMD operations on disjoint vector elements. In this paper we demon...
Saving the internal data of an application in an external form is called marshalling. A generic marshaller is difficult to optimize because the format of the data that will be mars...
Baris Aktemur, Joel Jones, Samuel N. Kamin, Lars C...