We present a dynamic optimization technique, thread warping, that uses a single processor on a multiprocessor system to dynamically synthesize threads into custom accelerator circ...
The development of energy-conscious embedded and/or mobile systems exposes a trade-off between energy consumption and system performance. Recent microprocessors have incorporated ...
Ankush Varma, Brinda Ganesh, Mainak Sen, Suchismit...
Causal request traces are valuable to developers of large concurrent and distributed applications, yet difficult to obtain. Traces show how a request is processed, and can be anal...
Abstract. As processor complexity increases compilers tend to deliver suboptimal performance. Library generators such as ATLAS, FFTW and SPIRAL overcome this issue by empirically s...
The wide-spread use of microprocessor based systems that utilize cache memory to alleviate excessively long DRAM access times introduces a new dimension in the quest to obtain goo...