Sciweavers

67 search results - page 8 / 14
» Array-of-arrays architecture for parallel floating point mul...
Sort
View
ANCS
2009
ACM
13 years 8 months ago
Design and performance analysis of a DRAM-based statistics counter array architecture
The problem of maintaining efficiently a large number (say millions) of statistics counters that need to be updated at very high speeds (e.g. 40 Gb/s) has received considerable re...
Haiquan (Chuck) Zhao, Hao Wang, Bill Lin, Jun (Jim...
ASPLOS
2009
ACM
14 years 11 months ago
QR decomposition on GPUs
QR decomposition is a computationally intensive linear algebra operation that factors a matrix A into the product of a unitary matrix Q and upper triangular matrix R. Adaptive sys...
Andrew Kerr, Dan Campbell, Mark Richards
IPPS
1998
IEEE
14 years 2 months ago
Optimizing Data Scheduling on Processor-in-Memory Arrays
In the study of PetaFlop project, Processor-In-Memory array was proposed to be a target architecture in achieving 1015 floating point operations per second computing performance. ...
Yi Tian, Edwin Hsing-Mean Sha, Chantana Chantrapor...
IPPS
2010
IEEE
13 years 8 months ago
Performance modeling of heterogeneous systems
Predicting how well applications may run on modern systems is becoming increasingly challenging. It is no longer sufficient to look at number of floating point operations and commu...
Jan Christian Meyer, Anne C. Elster
CONCURRENCY
2000
83views more  CONCURRENCY 2000»
13 years 10 months ago
Javia: A Java interface to the virtual interface architecture
The Virtual Interface (VI) architecture has become the industry standard for user-level network interfaces. This paper presents the implementation and evaluation of Javia, a Java ...
Chi-Chao Chang, Thorsten von Eicken