Sciweavers

78 search results - page 9 / 16
» Automatic memory partitioning: increasing memory parallelism...
Sort
View
ICCS
2004
Springer
14 years 1 months ago
Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geogra...
Henrik Löf, Markus Nordén, Sverker Hol...
DSD
2010
IEEE
161views Hardware» more  DSD 2010»
13 years 8 months ago
Design of Trace-Based Split Array Caches for Embedded Applications
—Since many embedded systems execute a predefined set of programs, tuning system components to application programs and data is the approach chosen by many design techniques to o...
Alice M. Tokarnia, Marina Tachibana
IPPS
2010
IEEE
13 years 5 months ago
Inter-block GPU communication via fast barrier synchronization
The graphics processing unit (GPU) has evolved from a fixedfunction processor with programmable stages to a programmable processor with many fixed-function components that deliver...
Shucai Xiao, Wu-chun Feng
HIPC
2000
Springer
13 years 11 months ago
Meta-data Management System for High-Performance Large-Scale Scientific Data Access
Many scientific applications manipulate large amount of data and, therefore, are parallelized on high-performance computing systems to take advantage of their computational power a...
Wei-keng Liao, Xiaohui Shen, Alok N. Choudhary
CF
2005
ACM
13 years 9 months ago
Drowsy region-based caches: minimizing both dynamic and static power dissipation
Power consumption within the memory hierarchy grows in importance as on-chip data caches occupy increasingly greater die area. Among dynamic power conservation schemes, horizontal...
Michael J. Geiger, Sally A. McKee, Gary S. Tyson