This paper presents reduction recognition and parallel code generationstrategies for distributed-memorymultiprocessors. We describe techniques to recognize a broad range of implic...
In this paper, an adaptive wormhole router for a flexible on-chip interconnection network is proposed and implemented for a Chip-Multi Processor (CMP). It adopts a wormhole switc...
Commercial applications are an important, yet often overlooked, workload with significantly different characteristics from technical workloads. The potential impact of these diffe...
Kimberly Keeton, David A. Patterson, Yong Qiang He...
We present a dynamic optimization technique, thread warping, that uses a single processor on a multiprocessor system to dynamically synthesize threads into custom accelerator circ...
In the Sesame framework, we develop a modeling and simulation environment for the efficient design space exploration of heterogeneous embedded systems. Since Sesame recognizes se...