Caches are very inefficiently utilized because not all the excess data fetched into the cache, to exploit spatial locality, is utilized. We define cache utilization as the percent...
In this paper we introduce the ConSet communication model for distributed memory parallel computers. The communication needs of an application program can be satisfied by some ar...
The Opie Project aims to develop a compiler to transform C codes written for row-major matrix representation into equivalent codes for Morton-order matrix representation, and to a...
GPGPUs have recently emerged as powerful vehicles for generalpurpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from N...
This paper considers the role of performance and area estimates from behavioral synthesis in design space exploration. We have developed a compilation system that automatically ma...