Sciweavers

656 search results - page 55 / 132
» Scalable Parallel Matrix Multiplication on Distributed Memor...
Sort
View
PPOPP
2010
ACM
14 years 4 months ago
Data transformations enabling loop vectorization on multithreaded data parallel architectures
Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memo...
Byunghyun Jang, Perhaad Mistry, Dana Schaa, Rodrig...
SIAMSC
2008
129views more  SIAMSC 2008»
13 years 7 months ago
Bottom-Up Construction and 2: 1 Balance Refinement of Linear Octrees in Parallel
Abstract. In this article, we propose new parallel algorithms for the construction and 2:1 balance refinement of large linear octrees on distributed memory machines. Such octrees a...
Hari Sundar, Rahul S. Sampath, George Biros
IPPS
2010
IEEE
13 years 5 months ago
Parallelization of tau-leap coarse-grained Monte Carlo simulations on GPUs
The Coarse-Grained Monte Carlo (CGMC) method is a multi-scale stochastic mathematical and simulation framework for spatially distributed systems. CGMC simulations are important too...
Lifan Xu, Michela Taufer, Stuart Collins, Dionisio...
ICS
2009
Tsinghua U.
14 years 2 months ago
Fast and scalable list ranking on the GPU
General purpose programming on the graphics processing units (GPGPU) has received a lot of attention in the parallel computing community as it promises to offer the highest perfo...
M. Suhail Rehman, Kishore Kothapalli, P. J. Naraya...
IPPS
1999
IEEE
13 years 12 months ago
Cascaded Execution: Speeding Up Unparallelized Execution on Shared-Memory Multiprocessors
Both inherently sequential code and limitations of analysis techniques prevent full parallelization of many applications by parallelizing compilers. Amdahl's Law tells us tha...
Ruth E. Anderson, Thu D. Nguyen, John Zahorjan