Modern embedded compute platforms increasingly contain both microprocessors and field-programmable gate arrays (FPGAs). The FPGAs may implement accelerators or other circuits to s...
Scalable shared-memory multiprocessors that are designed as Cache-Only Memory Architectures Coma allow automatic replication and migration of data in the main memory. This enhance...
Aggressive compiler optimizations such as software pipelining and loop invariant code motion can significantly improve application performance, but these transformations often re...
Chris Zimmer, Stephen Roderick Hines, Prasad Kulka...
This paper presents an analytical model to predict the performance of general-purpose applications on a GPU architecture. The model is designed to provide performance information ...
Sara S. Baghsorkhi, Matthieu Delahaye, Sanjay J. P...
The rapid revolution in microprocessor chip architecture due to multicore technology is presenting unprecedented challenges to the application developers as well as system softwar...