On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geogra...
—Since many embedded systems execute a predefined set of programs, tuning system components to application programs and data is the approach chosen by many design techniques to o...
The graphics processing unit (GPU) has evolved from a fixedfunction processor with programmable stages to a programmable processor with many fixed-function components that deliver...
Many scientific applications manipulate large amount of data and, therefore, are parallelized on high-performance computing systems to take advantage of their computational power a...
Power consumption within the memory hierarchy grows in importance as on-chip data caches occupy increasingly greater die area. Among dynamic power conservation schemes, horizontal...