In a hardware transactional memory system with lazy versioning and lazy conflict detection, the process of transaction commit can emerge as a bottleneck. This is especially true ...
Seth H. Pugsley, Manu Awasthi, Niti Madan, Naveen ...
This work presents an application case study. Geant4 is a 750,000 line toolkit first designed in the mid-1990s and originally intended only for sequential computation. Intel's...
In this paper we discuss the parallel implementation of the Cholesky factorization of a positive definite symmetric matrix when that matrix is block tridiagonal. While parallel im...
Thuan D. Cao, John F. Hall, Robert A. van de Geijn
An implementation of a parallel ScaLAPACK-style solver for the general Sylvester equation, op(A)X −Xop(B) = C, where op(A) denotes A or its transpose AT , is presented. The paral...
Abstract. Buffered coscheduling is a new methodology that can substantially increase resource utilization, improve response time, and simplify the development of the run-time suppo...