Data parallel compilers have long aimed to equal the performance of carefully hand-optimized parallel codes. For tightly-coupled applications based on line sweeps, this goal has b...
In this paper, we describe an algorithm and implementation of locality optimizations for architectures with instruction sets such as Intel’s SSE and Motorola’s AltiVec that su...
Code specialization is a way to obtain significant improvement in the performance of an application. It works by exposing values of different parameters in source code. The availa...
Minhaj Ahmad Khan, Henri-Pierre Charles, Denis Bar...
Access/execute architectures have several advantages over more traditional architectures. Because address generation and memory access are decoupled from operand use, memory laten...
Typed Assembly Languages (TALs) can be used to validate the safety of assembly-language programs. However, typing rules are usually trusted as axioms. In this paper, we show how to...
Gang Tan, Andrew W. Appel, Kedar N. Swadi, Dinghao...