Sciweavers

733 search results - page 107 / 147
» High performance in tree-based parallel architectures
Sort
View
ASPLOS
2009
ACM
14 years 9 months ago
QR decomposition on GPUs
QR decomposition is a computationally intensive linear algebra operation that factors a matrix A into the product of a unitary matrix Q and upper triangular matrix R. Adaptive sys...
Andrew Kerr, Dan Campbell, Mark Richards
IEEEPACT
2005
IEEE
14 years 2 months ago
Maximizing CMP Throughput with Mediocre Cores
In this paper we compare the performance of area equivalent small, medium, and large-scale multithreaded chip multiprocessors (CMTs) using throughput-oriented applications. We use...
John D. Davis, James Laudon, Kunle Olukotun
ICCS
2009
Springer
14 years 3 months ago
Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes
The development of optimized codes is time-consuming and requires extensive architecture, compiler, and language expertise, therefore, computational scientists are often forced to ...
Boyana Norris, Albert Hartono, Elizabeth R. Jessup...
HPCA
2006
IEEE
14 years 9 months ago
An approach for implementing efficient superscalar CISC processors
An integrated, hardware / software co-designed CISC processor is proposed and analyzed. The objectives are high performance and reduced complexity. Although the x86 ISA is targete...
Shiliang Hu, Ilhyun Kim, Mikko H. Lipasti, James E...
DPHOTO
2009
116views Hardware» more  DPHOTO 2009»
13 years 6 months ago
Interleaved imaging: an imaging system design inspired by rod-cone vision
Under low illumination conditions, such as moonlight, there simply are not enough photons present to create a high quality color image with integration times that avoid camera-sha...
Manu Parmar, Brian A. Wandell