Sciweavers

643 search results - page 26 / 129
» Using Hardware Counters to Automatically Improve Memory Perf...
Sort
View
ICCS
2004
Springer
14 years 2 months ago
Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geogra...
Henrik Löf, Markus Nordén, Sverker Hol...
MICRO
2009
IEEE
121views Hardware» more  MICRO 2009»
14 years 3 months ago
Improving memory bank-level parallelism in the presence of prefetching
DRAM systems achieve high performance when all DRAM banks are busy servicing useful memory requests. The degree to which DRAM banks are busy is called DRAM Bank-Level Parallelism ...
Chang Joo Lee, Veynu Narasiman, Onur Mutlu, Yale N...
TCAD
2008
167views more  TCAD 2008»
13 years 8 months ago
System-Level Dynamic Thermal Management for High-Performance Microprocessors
Abstract--Thermal issues are fast becoming major design constraints in high-performance systems. Temperature variations adversely affect system reliability and prompt worst-case de...
Amit Kumar 0002, Li Shang, Li-Shiuan Peh, Niraj K....
MICRO
2010
IEEE
270views Hardware» more  MICRO 2010»
13 years 6 months ago
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
Abstract-- We consider the problem of how to improve memory latency tolerance in massively multithreaded GPGPUs when the thread-level parallelism of an application is not sufficien...
Jaekyu Lee, Nagesh B. Lakshminarayana, Hyesoon Kim...
ISCAS
2002
IEEE
124views Hardware» more  ISCAS 2002»
14 years 1 months ago
Performance optimization of multiple memory architectures for DSP
Multiple memory module architecture offers higher performance by providing potentially doubled memory bandwidth. Two key problems in gaining high performance in this kind of archi...
Qingfeng Zhuge, Bin Xiao, Edwin Hsing-Mean Sha