Sciweavers

120 search results - page 10 / 24
» Analysis of Memory Hierarchy Performance of Block Data Layou...
Sort
View
PPOPP
1999
ACM
13 years 12 months ago
Automatic Parallelization of Divide and Conquer Algorithms
Divide and conquer algorithms are a good match for modern parallel machines: they tend to have large amounts of inherent parallelism and they work well with caches and deep memory...
Radu Rugina, Martin C. Rinard
ICS
2007
Tsinghua U.
14 years 1 months ago
Adaptive Strassen's matrix multiplication
Strassen’s matrix multiplication (MM) has benefits with respect to any (highly tuned) implementations of MM because Strassen’s reduces the total number of operations. Strasse...
Paolo D'Alberto, Alexandru Nicolau
CASES
2010
ACM
13 years 5 months ago
Improved procedure placement for set associative caches
The performance of most embedded systems is critically dependent on the memory hierarchy performance. In particular, higher cache hit rate can provide significant performance boos...
Yun Liang, Tulika Mitra
CODES
2001
IEEE
13 years 11 months ago
A design framework to efficiently explore energy-delay tradeoffs
Comprehensive exploration of the design space parameters at the system-level is a crucial task to evaluate architectural tradeoffs accounting for both energy and performance const...
William Fornaciari, Donatella Sciuto, Cristina Sil...
PLDI
1995
ACM
13 years 11 months ago
Unifying Data and Control Transformations for Distributed Shared Memory Machines
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Contr...
Michal Cierniak, Wei Li