Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies fr...
In this paper, we describe an algorithm and implementation of locality optimizations for architectures with instruction sets such as Intel’s SSE and Motorola’s AltiVec that su...
Memory-processor integration o ers new opportunities for reducing the energy of a system. In the case of embedded systems, one solution consists of mapping the most frequently acc...
Many System-on-a-Chip devices would benefit from the inclusion of reprogrammable logic on the silicon die, as it can add general computing ability, provide run -time reconfigurabil...
High-level languages are growing in popularity. However, decades of C software development have produced large libraries of fast, timetested, meritorious code that are impractical...
Tristan Ravitch, Steve Jackson, Eric Aderhold, Ben...