Sciweavers

1933 search results - page 284 / 387
» High-performance computing using accelerators
Sort
View
ASPLOS
2010
ACM
14 years 3 months ago
MacroSS: macro-SIMDization of streaming applications
SIMD (Single Instruction, Multiple Data) engines are an essential part of the processors in various computing markets, from servers to the embedded domain. Although SIMD-enabled a...
Amir Hormati, Yoonseo Choi, Mark Woh, Manjunath Ku...
ICS
2010
Tsinghua U.
14 years 1 months ago
Large-scale FFT on GPU clusters
A GPU cluster is a cluster equipped with GPU devices. Excellent acceleration is achievable for computation-intensive tasks (e.g. matrix multiplication and LINPACK) and bandwidth-i...
Yifeng Chen, Xiang Cui, Hong Mei
ISCA
2010
IEEE
214views Hardware» more  ISCA 2010»
13 years 11 months ago
Translation caching: skip, don't walk (the page table)
This paper explores the design space of MMU caches that accelerate virtual-to-physical address translation in processor architectures, such as x86-64, that use a radix tree page t...
Thomas W. Barr, Alan L. Cox, Scott Rixner
BMCBI
2008
108views more  BMCBI 2008»
13 years 9 months ago
SPRINT: A new parallel framework for R
Background: Microarray analysis allows the simultaneous measurement of thousands to millions of genes or sequences across tens to thousands of different samples. The analysis of t...
Jon Hill, Matthew Hambley, Thorsten Forster, Murie...
VLSID
2009
IEEE
170views VLSI» more  VLSID 2009»
14 years 9 months ago
Code Transformations for TLB Power Reduction
The Translation Look-aside Buffer (TLB) is a very important part in the hardware support for virtual memory management implementation of high performance embedded systems. The TLB...
Reiley Jeyapaul, Sandeep Marathe, Aviral Shrivasta...