This paper presents a task-centric memory model for 1000-core compute accelerators. Visual computing applications are emerging as an important class of workloads that can exploit ...
John H. Kelm, Daniel R. Johnson, Steven S. Lumetta...
In this paper, we describe the design and implementation of an integrated architecture for cache systems that scale to hundreds or thousands of caches with thousands to millions o...
Renu Tewari, Michael Dahlin, Harrick M. Vin, Jonat...
In this paper a parametrizable architecture of a motion estimator (ME) is presented. The ME is designed as a generic full pixel calculation module which can be adopted for dieren...
We are currently developing Willow, a shared-memory multiprocessor whose design provides system capacity and performance capable of supporting over a thousand commercial microproc...
John K. Bennett, Sandhya Dwarkadas, Jay A. Greenwo...
Recently-proposed processor microarchitectures for high Memory Level Parallelism (MLP) promise substantial performance gains. Unfortunately, current cache hierarchies have Miss-Ha...