Sciweavers

92 search results - page 16 / 19
» Performance debugging shared memory multiprocessor programs ...
Sort
View
IPPS
1999
IEEE
13 years 11 months ago
A Graph Based Framework to Detect Optimal Memory Layouts for Improving Data Locality
In order to extract high levels of performance from modern parallel architectures, the effective management of deep memory hierarchies is very important. While architectural advan...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
IPPS
1998
IEEE
13 years 11 months ago
COMPaS: A Pentium Pro PC-based SMP Cluster and Its Experience
We have built an eight node SMP cluster called COMPaS (Cluster Of Multi-Processor Systems), each node of which is a quadprocessor Pentium Pro PC. We have designed and implemented a...
Yoshio Tanaka, Motohiko Matsuda, Makoto Ando, Kazu...
IWOMP
2009
Springer
14 years 1 months ago
Scalability Evaluation of Barrier Algorithms for OpenMP
OpenMP relies heavily on barrier synchronization to coordinate the work of threads that are performing the computations in a parallel region. A good implementation of barriers is ...
Ramachandra C. Nanjegowda, Oscar Hernandez, Barbar...
EUROPAR
2008
Springer
13 years 9 months ago
MPC: A Unified Parallel Runtime for Clusters of NUMA Machines
Over the last decade, Message Passing Interface (MPI) has become a very successful parallel programming environment for distributed memory architectures such as clusters. However, ...
Marc Pérache, Hervé Jourdren, Raymon...
ICPP
1998
IEEE
13 years 11 months ago
A memory-layout oriented run-time technique for locality optimization
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
Yong Yan, Xiaodong Zhang, Zhao Zhang