Sciweavers

41 search results - page 6 / 9
» The Latency Hiding Effectiveness of Decoupled Access Execute...
Sort
View
JPDC
2010
106views more  JPDC 2010»
13 years 5 months ago
Feedback-directed page placement for ccNUMA via hardware-generated memory traces
Non-uniform memory architectures with cache coherence (ccNUMA) are becoming increasingly common, not just for large-scale high performance platforms but also in the context of mul...
Jaydeep Marathe, Vivek Thakkar, Frank Mueller
IPPS
2000
IEEE
13 years 11 months ago
Reducing Ownership Overhead for Load-Store Sequences in Cache-Coherent Multiprocessors
Parallel programs that modify shared data in a cachecoherent multiprocessor with a write-invalidate coherence protocol create ownership overhead in the form of ownership acquisiti...
Jim Nilsson, Fredrik Dahlgren
HPCA
2000
IEEE
13 years 11 months ago
Coherence Communication Prediction in Shared-Memory Multiprocessors
Abstract—Sharing patterns in shared-memory multiprocessors are the key to performance: uniprocessor latencytolerating techniques such as out-of-order execution and non-blocking c...
Stefanos Kaxiras, Cliff Young
MICRO
2000
IEEE
176views Hardware» more  MICRO 2000»
13 years 7 months ago
An Advanced Optimizer for the IA-64 Architecture
level of abstraction, compared with the program representation for scalar optimizations. For example, loop unrolling and loop unrolland-jam transformations exploit the large regist...
Rakesh Krishnaiyer, Dattatraya Kulkarni, Daniel M....
IPPS
2009
IEEE
14 years 2 months ago
Implementing and evaluating multithreaded triad census algorithms on the Cray XMT
Commonly represented as directed graphs, social networks depict relationships and behaviors among social entities such as people, groups, and organizations. Social network analysi...
George Chin Jr., Andrès Márquez, Sut...