Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies fr...
Applications need to become more concurrent to take advantage of the increased computational power provided by chip level multiprocessing. Programmers have traditionally managed t...
Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hu...
To expand the Scalable Coherent Interface's (SCI) capabilities so it can be used to efficiently handle sharing in systems of hundreds or even thousands of processors, the SCI...
— In this paper, we study the problem of efficiently computing multiple aggregation queries over a data stream. In order to share computation, prior proposals have suggested ins...
K. V. M. Naidu, Rajeev Rastogi, Scott Satkin, Anan...
A style for programming problems from matrix algebra is developed with a familiar example and new tools, yielding high performance with a couple of surprising exceptions. The under...
David S. Wise, Craig Citro, Joshua Hursey, Fang Li...