Sciweavers

1022 search results - page 180 / 205
» Automatic data and computation decomposition on distributed ...
Sort
View
SPAA
2012
ACM
11 years 10 months ago
SALSA: scalable and low synchronization NUMA-aware algorithm for producer-consumer pools
We present a highly-scalable non-blocking producer-consumer task pool, designed with a special emphasis on lightweight synchronization and data locality. The core building block o...
Elad Gidron, Idit Keidar, Dmitri Perelman, Yonatha...
HPCA
2007
IEEE
14 years 1 months ago
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing
Shared memory multiprocessors play an increasingly important role in enterprise and scientific computing facilities. Remote misses limit the performance of shared memory applicat...
Liqun Cheng, John B. Carter, Donglai Dai
EUROPAR
2000
Springer
13 years 11 months ago
Design and Evaluation of a Compiler-Directed Collective I/O Technique
Abstract. Current approaches to parallel I/O demand extensive user effort to obtain acceptable performance. This is in part due to difficulties in understanding the characteristics...
Gokhan Memik, Mahmut T. Kandemir, Alok N. Choudhar...
CLUSTER
2004
IEEE
13 years 11 months ago
FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI
As high performance clusters continue to grow in size, the mean time between failure shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challengi...
Gengbin Zheng, Lixia Shi, Laxmikant V. Kalé
SPAA
1996
ACM
13 years 11 months ago
From AAPC Algorithms to High Performance Permutation Routing and Sorting
Several recent papers have proposed or analyzed optimal algorithms to route all-to-all personalizedcommunication (AAPC) over communication networks such as meshes, hypercubes and ...
Thomas Stricker, Jonathan C. Hardwick