Sciweavers

61 search results - page 9 / 13
» Evaluation of Dynamic Data Distributions on NUMA Shared Memo...
Sort
View
ICPP
2008
IEEE
14 years 1 months ago
Taming Single-Thread Program Performance on Many Distributed On-Chip L2 Caches
This paper presents a two-part study on managing distributed NUCA (Non-Uniform Cache Architecture) L2 caches in a future manycore processor to obtain high singlethread program per...
Lei Jin, Sangyeun Cho
HPCC
2009
Springer
13 years 11 months ago
Dynamically Filtering Thread-Local Variables in Lazy-Lazy Hardware Transactional Memory
Abstract--Transactional Memory (TM) is an emerging technology which promises to make parallel programming easier. However, to be efficient, underlying TM system should protect only...
Sutirtha Sanyal, Sourav Roy, Adrián Cristal...
IPPS
2006
IEEE
14 years 1 months ago
Coterminous locality and coterminous group data prefetching on chip-multiprocessors
Due to shared cache contentions and interconnect delays, data prefetching is more critical in alleviating penalties from increasing memory latencies and demands on Chip-Multiproce...
Xudong Shi, Zhen Yang, Jih-Kwon Peir, Lu Peng, Yen...
IPPS
2006
IEEE
14 years 1 months ago
Making lockless synchronization fast: performance implications of memory reclamation
Achieving high performance for concurrent applications on modern multiprocessors remains challenging. Many programmers avoid locking to improve performance, while others replace l...
Thomas E. Hart, Paul E. McKenney, Angela Demke Bro...
IEEEPACT
2008
IEEE
14 years 1 months ago
Adaptive insertion policies for managing shared caches
Chip Multiprocessors (CMPs) allow different applications to concurrently execute on a single chip. When applications with differing demands for memory compete for a shared cache, ...
Aamer Jaleel, William Hasenplaugh, Moinuddin K. Qu...