This paper presents a novel technique to perform global optimization of communication and preprocessing calls in the presence of array accesses with arbitrary subscripts. Our sche...
The Cray X1 was recently introduced as the first in a new line of parallel systems to combine high-bandwidth vector processing with an MPP system architecture. Alongside capabili...
Christian Bell, Wei-Yu Chen, Dan Bonachea, Katheri...
Multiple issue of instructions occurs in superscalar and VLIW machines. This paper investigates a third type of machine design, which combines the advantages of code compatibility...
This paper considers a recently proposed method for unsupervised learning and dimensionality reduction, locally linear embedding (LLE). LLE computes a compact representation of hi...
For memory constrained environments like embedded systems, optimization for program size is often as important, if not more important, as optimization for execution speed. Commonl...