Sciweavers

128 search results - page 20 / 26
» Comparative Evaluation of Latency Tolerance Techniques for S...
Sort
View
CGO
2004
IEEE
13 years 11 months ago
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors
Pre-execution techniques have received much attention as an effective way of prefetching cache blocks to tolerate the everincreasing memory latency. A number of pre-execution tech...
Dongkeun Kim, Shih-Wei Liao, Perry H. Wang, Juan d...
ICS
2005
Tsinghua U.
14 years 1 months ago
Towards automatic translation of OpenMP to MPI
We present compiler techniques for translating OpenMP shared-memory parallel applications into MPI messagepassing programs for execution on distributed memory systems. This transl...
Ayon Basumallik, Rudolf Eigenmann
PPOPP
2009
ACM
14 years 8 months ago
A compiler-directed data prefetching scheme for chip multiprocessors
Data prefetching has been widely used in the past as a technique for hiding memory access latencies. However, data prefetching in multi-threaded applications running on chip multi...
Dhruva Chakrabarti, Mahmut T. Kandemir, Mustafa Ka...
WOSP
2004
ACM
14 years 29 days ago
Using locality of reference to improve performance of peer-to-peer applications
Peer-to-peer, or simply P2P, systems have recently emerged as a popular paradigm for building distributed applications. One key aspect of the P2P system design is the mechanism us...
Marcelo Werneck Barbosa, Melissa Morgado Costa, Ju...
IPPS
2007
IEEE
14 years 1 months ago
A Power-Aware Prediction-Based Cache Coherence Protocol for Chip Multiprocessors
Snoopy cache coherence protocols broadcast requests to all nodes, reducing the latency of cache to cache transfer misses at the expense of increasing interconnect power. We propos...
Ehsan Atoofian, Amirali Baniasadi