Sciweavers

70 search results - page 6 / 14
» Programming for parallelism and locality with hierarchically...
Sort
View
ICPP
1993
IEEE
14 years 21 days ago
Automatic Parallelization Techniques for the EM-4
: This paper presents a Data-Distributed Execution approach that exploits interation-level parallelism in loops operating over arrays. It performs data-dependency analysis, based o...
Lubomir Bic, Mayez A. Al-Mouhamed
EUROPAR
2006
Springer
14 years 7 days ago
A Hierarchical CLH Queue Lock
Abstract. Modern multiprocessor architectures such as CC-NUMA machines or CMPs have nonuniform communication architectures that render programs sensitive to memory access locality....
Victor Luchangco, Daniel Nussbaum, Nir Shavit
ICPP
1998
IEEE
14 years 25 days ago
A memory-layout oriented run-time technique for locality optimization
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
Yong Yan, Xiaodong Zhang, Zhao Zhang
ICS
2003
Tsinghua U.
14 years 1 months ago
Estimating cache misses and locality using stack distances
Cache behavior modeling is an important part of modern optimizing compilers. In this paper we present a method to estimate the number of cache misses, at compile time, using a mac...
Calin Cascaval, David A. Padua
CLUSTER
2004
IEEE
14 years 9 days ago
Predicting memory-access cost based on data-access patterns
Improving memory performance at software level is more effective in reducing the rapidly expanding gap between processor and memory performance. Loop transformations (e.g. loop un...
Surendra Byna, Xian-He Sun, William Gropp, Rajeev ...