Sciweavers

140 search results - page 12 / 28
» Profiling and mapping of parallel workloads on network proce...
Sort
View
IPPS
2005
IEEE
14 years 2 months ago
Reducing Power with Performance Constraints for Parallel Sparse Applications
Sparse and irregular computations constitute a large fraction of applications in the data-intensive scientific domain. While every effort is made to balance the computational wor...
Guangyu Chen, Konrad Malkowski, Mahmut T. Kandemir...
ICPP
2009
IEEE
14 years 3 months ago
Speeding Up Distributed MapReduce Applications Using Hardware Accelerators
—In an attempt to increase the performance/cost ratio, large compute clusters are becoming heterogeneous at multiple levels: from asymmetric processors, to different system archi...
Yolanda Becerra, Vicenç Beltran, David Carr...
HIPC
2009
Springer
13 years 6 months ago
Distance-aware round-robin mapping for large NUCA caches
In many-core architectures, memory blocks are commonly assigned to the banks of a NUCA cache by following a physical mapping. This mapping assigns blocks to cache banks in a round-...
Alberto Ros, Marcelo Cintra, Manuel E. Acacio, Jos...
ICA3PP
2010
Springer
14 years 1 months ago
Modular Resultant Algorithm for Graphics Processors
Abstract. In this paper we report on the recent progress in computing bivariate polynomial resultants on Graphics Processing Units (GPU). Given two polynomials in Z[x, y], our algo...
Pavel Emeliyanenko
HPCA
1997
IEEE
14 years 1 months ago
A Performance Comparison of Hierarchical Ring- and Mesh-Connected Multiprocessor Networks
This paper compares the performance of hierarchical ring- and mesh-connected wormhole routed shared memory multiprocessor networks in a simulation study. Hierarchical rings are in...
Govindan Ravindran, Michael Stumm