Sciweavers

38 search results - page 7 / 8
» Data partition: A Practical Parallel Evaluation of Datalog P...
Sort
View
IEEEPACT
2006
IEEE
14 years 1 months ago
Whole-program optimization of global variable layout
On machines with high-performance processors, the memory system continues to be a performance bottleneck. Compilers insert prefetch operations and reorder data accesses to improve...
Nathaniel McIntosh, Sandya Mannarswamy, Robert Hun...
CIC
2003
150views Communications» more  CIC 2003»
13 years 8 months ago
Performance Modeling of a Cluster of Workstations
Using off-the-shelf commodity workstations to build a cluster for parallel computing has become a common practice. In studying or designing a cluster of workstations one should ha...
Ahmed M. Mohamed, Lester Lipsky, Reda A. Ammar
CLUSTER
2009
IEEE
13 years 11 months ago
Analytical modeling and optimization for affinity based thread scheduling on multicore systems
Abstract--This paper proposes an analytical model to estimate the cost of running an affinity-based thread schedule on multicore systems. The model consists of three submodels to e...
Fengguang Song, Shirley Moore, Jack Dongarra
IPPS
2010
IEEE
13 years 5 months ago
Inter-block GPU communication via fast barrier synchronization
The graphics processing unit (GPU) has evolved from a fixedfunction processor with programmable stages to a programmable processor with many fixed-function components that deliver...
Shucai Xiao, Wu-chun Feng
WCE
2007
13 years 8 months ago
Sparse Matrix Multiplication Using UPC
—Partitioned global address space (PGAS) languages, such as Unified Parallel C (UPC) have the promise of being productive. Due to the shared address space view that they provide,...
Hoda El-Sayed, Eric Wright