This paper introduces a new abstraction to accelerate the readbarriers and write-barriers used by language runtime systems. We exploit the fact that, dynamically, many barrier exe...
Modern programs frequently employ sophisticated modular designs. As a result, performance problems cannot be identified from costs attributed to routines in isolation; understand...
Nathan R. Tallent, John M. Mellor-Crummey, Michael...
Processor Idle Cycle Aggregation (PICA) is a promising approach for low power execution of processors, in which small memory stalls are aggregated to create a large one, and the p...
We present a dynamic optimization technique, thread warping, that uses a single processor on a multiprocessor system to dynamically synthesize threads into custom accelerator circ...
Many sensor nodes contain resource constrained microcontrollers where user level applications, operating system components, and device drivers share a single address space with no...