Sciweavers

38 search results - page 7 / 8
» Data partition: A Practical Parallel Evaluation of Datalog P...
Sort
View
IEEEPACT
2006
IEEE
15 years 10 months ago
Whole-program optimization of global variable layout
On machines with high-performance processors, the memory system continues to be a performance bottleneck. Compilers insert prefetch operations and reorder data accesses to improve...
Nathaniel McIntosh, Sandya Mannarswamy, Robert Hun...
CIC
2003
150views Communications» more  CIC 2003»
15 years 5 months ago
Performance Modeling of a Cluster of Workstations
Using off-the-shelf commodity workstations to build a cluster for parallel computing has become a common practice. In studying or designing a cluster of workstations one should ha...
Ahmed M. Mohamed, Lester Lipsky, Reda A. Ammar
CLUSTER
2009
IEEE
15 years 8 months ago
Analytical modeling and optimization for affinity based thread scheduling on multicore systems
Abstract--This paper proposes an analytical model to estimate the cost of running an affinity-based thread schedule on multicore systems. The model consists of three submodels to e...
Fengguang Song, Shirley Moore, Jack Dongarra
IPPS
2010
IEEE
15 years 2 months ago
Inter-block GPU communication via fast barrier synchronization
The graphics processing unit (GPU) has evolved from a fixedfunction processor with programmable stages to a programmable processor with many fixed-function components that deliver...
Shucai Xiao, Wu-chun Feng
WCE
2007
15 years 5 months ago
Sparse Matrix Multiplication Using UPC
—Partitioned global address space (PGAS) languages, such as Unified Parallel C (UPC) have the promise of being productive. Due to the shared address space view that they provide,...
Hoda El-Sayed, Eric Wright