This paper explores compiler techniques for reducing the memory needed to load and run program executables. In embedded systems, where economic incentives to reduce both ram and r...
Instruction set customization is an effective way to improve processor performance. Critical portions of application dataflow graphs are collapsed for accelerated execution on s...
Nathan Clark, Jason A. Blome, Michael L. Chu, Scot...
The Distributed Shared Memory (DSM) model is designed to leverage the ease of programming of the shared memory paradigm, while enabling the highperformance by expressing locality ...
This paper introduces a compiler-orchestrated prefetching system as a unified framework geared toward ameliorating the gap between processing speeds and memory access latencies. ...
Rodric M. Rabbah, Hariharan Sandanagobalane, Mongk...
This paper considers the efficient parallel implementation of control constructs and expressions written in a common software programming language and synthesised to FPGA platform...