Cray X1 Fortran and C/C++ compilers provide a number of loop transformations, notably vectorization and multistreaming, in order to exploit the multistreaming processor (MSP) hard...
The Cray MTA-2 system provides exceptional performance on a variety of sparse graph algorithms. Unfortunately, it was an extremely expensive platform. Cray is preparing an Eldorad...
Keith D. Underwood, Megan Vance, Jonathan W. Berry...
We propose an interleaved memory organization supporting multi-pattern parallel accesses in twodimensional (2D) addressing space. Our proposal targets computing systems with high ...
Arseni Vitkovski, Georgi Kuzmanov, Georgi Gaydadji...
In this paper we propose a design technique to pipeline cache memories for high bandwidth applications. With the scaling of technology cache access latencies are multiple clock cy...
Advances in VLSI technology will enable chips with over a billion transistors within the next decade. Unfortunately, the centralized-resource architectures of modern microprocesso...
Walter Lee, Rajeev Barua, Matthew Frank, Devabhakt...