Sciweavers

624 search results - page 24 / 125
» High Performance Matrix Multiplication on Many Cores
Sort
View
SC
1995
ACM
14 years 16 days ago
Parallel Matrix-Vector Product Using Approximate Hierarchical Methods
Matrix-vector products (mat-vecs) form the core of iterative methods used for solving dense linear systems. Often, these systems arise in the solution of integral equations used i...
Ananth Grama, Vipin Kumar, Ahmed H. Sameh
ISLPED
2003
ACM
122views Hardware» more  ISLPED 2003»
14 years 2 months ago
A mixed-clock issue queue design for globally asynchronous, locally synchronous processor cores
Ever shrinking device sizes and innovative micro-architectural and circuit design techniques have made it possible to have multi-million transistor systems running at multi-gigahe...
Venkata Syam P. Rapaka, Diana Marculescu
ISCA
2009
IEEE
276views Hardware» more  ISCA 2009»
14 years 3 months ago
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
Many multi-core processors employ a large last-level cache (LLC) shared among the multiple cores. Past research has demonstrated that sharing-oblivious cache management policies (...
Yuejian Xie, Gabriel H. Loh
ARC
2012
Springer
317views Hardware» more  ARC 2012»
12 years 4 months ago
A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem
Iterative numerical algorithms with high memory bandwidth requirements but medium-size data sets (matrix size ∼ a few 100s) are highly appropriate for FPGA acceleration. This pap...
Abid Rafique, Nachiket Kapre, George A. Constantin...
ARC
2008
Springer
115views Hardware» more  ARC 2008»
13 years 11 months ago
A High Throughput FPGA-based Floating Point Conjugate Gradient Implementation
As Field Programmable Gate Arrays (FPGAs) have reached capacities beyond millions of equivalent gates, it becomes possible to accelerate floating-point scientific computing applica...
Antonio Roldao Lopes, George A. Constantinides