Sciweavers

198 search results - page 32 / 40
» Automatic Performance Diagnosis of Parallel Computations wit...
Sort
View
HPDC
2010
IEEE
13 years 8 months ago
Multi-GPU volume rendering using MapReduce
In this paper we present a multi-GPU parallel volume rendering implemention built using the MapReduce programming model. We give implementation details of the library, including s...
Jeff A. Stuart, Cheng-Kai Chen, Kwan-Liu Ma, John ...
IPPS
2009
IEEE
14 years 2 months ago
A cross-input adaptive framework for GPU program optimizations
Abstract—Recent years have seen a trend in using graphic processing units (GPU) as accelerators for general-purpose computing. The inexpensive, single-chip, massively parallel ar...
Yixun Liu, Eddy Z. Zhang, Xipeng Shen
ESCIENCE
2006
IEEE
13 years 11 months ago
Scientific Workflows: More e-Science Mileage from Cyberinfrastructure
We view scientific workflows as the domain scientist's way to harness cyberinfrastructure for e-Science. Domain scientists are often interested in "end-to-end" fram...
Bertram Ludäscher, Shawn Bowers, Timothy M. M...
EUROPAR
2011
Springer
12 years 7 months ago
A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures
: Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard ...
Emmanuel Agullo, Jack Dongarra, Rajib Nath, Stanim...
IPPS
2000
IEEE
13 years 12 months ago
Dynamic Data Layouts for Cache-Conscious Factorization of DFT
Effective utilization of cache memories is a key factor in achieving high performance in computing the Discrete Fourier Transform (DFT). Most optimizationtechniques for computing ...
Neungsoo Park, Dongsoo Kang, Kiran Bondalapati, Vi...