Sciweavers

901 search results - page 171 / 181
» Hiding Communication Latency in Data Parallel Applications
Sort
View
ISCA
2000
IEEE
63views Hardware» more  ISCA 2000»
14 years 4 days ago
An embedded DRAM architecture for large-scale spatial-lattice computations
Spatial-lattice computations with finite-range interactions are an important class of easily parallelized computations. This class includes many simple and direct algorithms for ...
Norman Margolus
PC
2002
137views Management» more  PC 2002»
13 years 7 months ago
The Chebyshev iteration revisited
Compared to Krylov space methods based on orthogonal or oblique projection, the Chebyshev iteration does not require inner products and is therefore particularly suited for massiv...
Martin H. Gutknecht, Stefan Röllin
WWW
2005
ACM
14 years 8 months ago
LSH forest: self-tuning indexes for similarity search
We consider the problem of indexing high-dimensional data for answering (approximate) similarity-search queries. Similarity indexes prove to be important in a wide variety of sett...
Mayank Bawa, Tyson Condie, Prasanna Ganesan
ASPLOS
1996
ACM
13 years 12 months ago
An Integrated Compile-Time/Run-Time Software Distributed Shared Memory System
On a distributed memory machine, hand-coded message passing leads to the most efficient execution, but it is difficult to use. Parallelizing compilers can approach the performance...
Sandhya Dwarkadas, Alan L. Cox, Willy Zwaenepoel
ICS
2009
Tsinghua U.
14 years 2 months ago
MPI-aware compiler optimizations for improving communication-computation overlap
Several existing compiler transformations can help improve communication-computation overlap in MPI applications. However, traditional compilers treat calls to the MPI library as ...
Anthony Danalis, Lori L. Pollock, D. Martin Swany,...