Sciweavers

1022 search results - page 136 / 205
» Automatic data and computation decomposition on distributed ...
Sort
View
SIAMSC
2011
140views more  SIAMSC 2011»
12 years 10 months ago
A Fast Parallel Algorithm for Selected Inversion of Structured Sparse Matrices with Application to 2D Electronic Structure Calcu
Abstract. An efficient parallel algorithm is presented and tested for computing selected components of H−1 where H has the structure of a Hamiltonian matrix of two-dimensional la...
Lin Lin, Chao Yang, Jianfeng Lu, Lexing Ying, Wein...
HPCA
2006
IEEE
14 years 8 months ago
Completely verifying memory consistency of test program executions
An important means of validating the design of commercial-grade shared memory multiprocessors is to run a large number of pseudo-random test programs on them. However, when intent...
Chaiyasit Manovit, Sudheendra Hangal
ICS
1999
Tsinghua U.
13 years 12 months ago
Eliminating synchronization bottlenecks in object-based programs using adaptive replication
This paper presents a technique, adaptive replication, for automatically eliminating synchronization bottlenecks in multithreaded programs that perform atomic operations on object...
Martin C. Rinard, Pedro C. Diniz
ICS
1993
Tsinghua U.
13 years 11 months ago
Anatomy of a Message in the Alewife Multiprocessor
Shared-memory provides a uniform and attractive mechanism for communication. For efficiency, it is often implemented with a layer of interpretive hardware on top of a message-pas...
John Kubiatowicz, Anant Agarwal
ICS
2001
Tsinghua U.
14 years 5 days ago
Tools for application-oriented performance tuning
Application performance tuning is a complex process that requires assembling various types of information and correlating it with source code to pinpoint the causes of performance...
John M. Mellor-Crummey, Robert J. Fowler, David B....