Abstract. Themain contributionof thiswork isto propose a numberof broadcastefficient VLSI architectures for computing the sum and the prefix sums of a w k-bit, k 2, binary sequenc...
Rong Lin, Koji Nakano, Stephan Olariu, Maria Crist...
PAWS (Parallel Application WorkSpace) is a software infrastructure for use in connecting separate parallel applications within a component-like model. A central PAWS Controller co...
Peter H. Beckman, Patricia K. Fasel, William F. Hu...
Global locality analysis is a technique for improving the cache performance of a sequence of loop nests through a combination of loop and data layout optimizations. Pure loop tran...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
In this paper, we present several novel strategies to improve software controlled cache utilization, so as to achieve lower power requirements for multi-media and signal processin...
The ability to automatically parallelize standard programming languages results in program portability across a wide range of machine architectures. It is the goal of the Polaris ...
William Blume, Rudolf Eigenmann, Keith Faigin, Joh...