Previous research has used program transformation to introduce parallelism and to exploit data locality. Unfortunately,these twoobjectives have usuallybeen considered independentl...
The Tensor Contraction Engine (TCE) is a domain-specific compiler for implementing complex tensor contraction expressions arising in quantum chemistry applications modeling elect...
Execution and communication traces are central to performance modeling and analysis. Since the traces can be very long, meaningful compression and extraction of representative beha...
Emerging microprocessors offer unprecedented parallel computing capabilities and deeper memory hierarchies, increasing the importance of loop transformations in optimizing compile...
In retargeting loop-based code for multimedia instruction set extensions, a critical issue is that vector data types of mixed precision within a loop body complicate the paralleli...