Sciweavers

656 search results - page 85 / 132
» Scalable Parallel Matrix Multiplication on Distributed Memor...
Sort
View
IPPS
2005
IEEE
14 years 1 months ago
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture
This paper presents the design and implementation of a thread virtual machine, called TNT (or TiNy-Threads) for the IBM Cyclops64 architecture (the latest Cyclops architecture tha...
Juan del Cuvillo, Weirong Zhu, Ziang Hu, Guang R. ...
PPOPP
2005
ACM
14 years 1 months ago
Revocable locks for non-blocking programming
In this paper we present a new form of revocable lock that streamlines the construction of higher level concurrency abstractions such as atomic multi-word heap updates. The key id...
Tim Harris, Keir Fraser
IWOMP
2009
Springer
14 years 2 months ago
Scalability Evaluation of Barrier Algorithms for OpenMP
OpenMP relies heavily on barrier synchronization to coordinate the work of threads that are performing the computations in a parallel region. A good implementation of barriers is ...
Ramachandra C. Nanjegowda, Oscar Hernandez, Barbar...
ARC
2008
Springer
115views Hardware» more  ARC 2008»
13 years 9 months ago
A High Throughput FPGA-based Floating Point Conjugate Gradient Implementation
As Field Programmable Gate Arrays (FPGAs) have reached capacities beyond millions of equivalent gates, it becomes possible to accelerate floating-point scientific computing applica...
Antonio Roldao Lopes, George A. Constantinides
ICPP
2002
IEEE
14 years 20 days ago
Analysis of Memory Hierarchy Performance of Block Data Layout
Recently, several experimental studies have been conducted on block data layout as a data transformation technique used in conjunction with tiling to improve cache performance. In...
Neungsoo Park, Bo Hong, Viktor K. Prasanna