Sciweavers

294 search results - page 53 / 59
» Efficient Execution of Parallel Applications in Multiprogram...
Sort
View
HPCA
1998
IEEE
13 years 11 months ago
Enhancing Memory Use in Simple Coma: Multiplexed Simple Coma
Scalable shared-memory multiprocessors that are designed as Cache-Only Memory Architectures Coma allow automatic replication and migration of data in the main memory. This enhance...
Sujoy Basu, Josep Torrellas
IEEEPACT
2008
IEEE
14 years 1 months ago
Feature selection and policy optimization for distributed instruction placement using reinforcement learning
Communication overheads are one of the fundamental challenges in a multiprocessor system. As the number of processors on a chip increases, communication overheads and the distribu...
Katherine E. Coons, Behnam Robatmili, Matthew E. T...
ASPLOS
2009
ACM
14 years 8 months ago
QR decomposition on GPUs
QR decomposition is a computationally intensive linear algebra operation that factors a matrix A into the product of a unitary matrix Q and upper triangular matrix R. Adaptive sys...
Andrew Kerr, Dan Campbell, Mark Richards
ARCS
2008
Springer
13 years 9 months ago
An Optimized ZGEMM Implementation for the Cell BE
: The architecture of the IBM Cell BE processor represents a new approach for designing CPUs. The fast execution of legacy software has to stand back in order to achieve very high ...
Timo Schneider, Torsten Hoefler, Simon Wunderlich,...
IPPS
1996
IEEE
13 years 11 months ago
Dag-Consistent Distributed Shared Memory
We introduce dag consistency, a relaxed consistency model for distributed shared memory which is suitable for multithreaded programming. We have implemented dag consistency in sof...
Robert D. Blumofe, Matteo Frigo, Christopher F. Jo...