—The ever-increasing computational power of contemporary microprocessors reduces the execution time spent on arithmetic computations (i.e., the computations not involving slow me...
This paper examines the scalable parallel implementation of QR factorization of a general matrix, targeting SMP and multi-core architectures. Two implementations of algorithms-by-...
Transactional Memory (TM) provides mechanisms that promise to simplify parallel programming by eliminating the need for locks and their associated problems (deadlock, livelock, pr...
Hassan Chafi, Jared Casper, Brian D. Carlstrom, Au...
In this paper a parallel implementation of a watershed algorithm is proposed. The algorithm is designed for a ring-architecture with distributed memory and a piece of shared memory...
The development of efficient parallel out-of-core applications is often tedious, because of the need to explicitly manage the movement of data between files and data structures ...