In the study of PetaFlop project, Processor-In-Memory array was proposed to be a target architecture in achieving 1015 floating point operations per second computing performance. ...
Yi Tian, Edwin Hsing-Mean Sha, Chantana Chantrapor...
In many of embedded systems, particularly for those with high data computations, the delay of memory access is one of the major bottlenecks in the system's performance. It ha...
level of abstraction, compared with the program representation for scalar optimizations. For example, loop unrolling and loop unrolland-jam transformations exploit the large regist...
Rakesh Krishnaiyer, Dattatraya Kulkarni, Daniel M....
Abstract. The data redistribution problems on multi-computers had been extensively studied. Irregular data redistribution has been paid attention recently since it can distribute d...
This paper presents a novel technique to perform global optimization of communication and preprocessing calls in the presence of array accesses with arbitrary subscripts. Our sche...