Sciweavers

2512 search results - page 477 / 503
» Software Transactional Memory
Sort
View
TPDS
2010
174views more  TPDS 2010»
13 years 6 months ago
Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures
The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU, QR factorizations to t...
Hatem Ltaief, Jakub Kurzak, Jack Dongarra
ISPASS
2010
IEEE
13 years 5 months ago
Weak execution ordering - exploiting iterative methods on many-core GPUs
Abstract--On NVIDIA's many-core GPUs, there is no synchronization function among parallel thread blocks. When finegranularity of data communication and synchronization is requ...
Jianmin Chen, Zhuo Huang, Feiqi Su, Jih-Kwon Peir,...
CAL
2010
13 years 5 months ago
A Case for Alternative Nested Paging Models for Virtualized Systems
Address translation often emerges as a critical performance bottleneck for virtualized systems and has recently been the impetus for hardware paging mechanisms. These mechanisms ap...
Giang Hoang, Chang Bae, Jack Lange, Lide Zhang, Pe...
CAL
2010
13 years 4 months ago
SMT-Directory: Efficient Load-Load Ordering for SMT
Memory models like SC, TSO, and PC enforce load-load ordering, requiring that loads from any single thread appear to occur in program order to all other threads. Out-of-order execu...
A. Hilton, A. Roth
ASPLOS
2011
ACM
12 years 11 months ago
Sponge: portable stream programming on graphics engines
Graphics processing units (GPUs) provide a low cost platform for accelerating high performance computations. The introduction of new programming languages, such as CUDA and OpenCL...
Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor N. ...