Sciweavers

1263 search results - page 241 / 253
» Scatter-Add in Data Parallel Architectures
Sort
View
CHES
2007
Springer
327views Cryptology» more  CHES 2007»
14 years 1 months ago
On the Power of Bitslice Implementation on Intel Core2 Processor
Abstract. This paper discusses the state-of-the-art fast software implementation of block ciphers on Intel’s new microprocessor Core2, particularly concentrating on “bitslice i...
Mitsuru Matsui, Junko Nakajima
ICS
2009
Tsinghua U.
14 years 11 days ago
Exploring pattern-aware routing in generalized fat tree networks
New static source routing algorithms for High Performance Computing (HPC) are presented in this work. The target parallel architectures are based on the commonly used fattree netw...
Germán Rodríguez, Ramón Beivi...
HPCA
2006
IEEE
14 years 8 months ago
LogTM: log-based transactional memory
Transactional memory (TM) simplifies parallel programming by guaranteeing that transactions appear to execute atomically and in isolation. Implementing these properties includes p...
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan...
SIGMOD
2007
ACM
179views Database» more  SIGMOD 2007»
14 years 8 months ago
How to barter bits for chronons: compression and bandwidth trade offs for database scans
Two trends are converging to make the CPU cost of a table scan a more important component of database performance. First, table scans are becoming a larger fraction of the query p...
Allison L. Holloway, Vijayshankar Raman, Garret Sw...
PDCAT
2009
Springer
14 years 2 months ago
CheCUDA: A Checkpoint/Restart Tool for CUDA Applications
Abstract—In this paper, a tool named CheCUDA is designed to checkpoint CUDA applications that use GPUs as accelerators. As existing checkpoint/restart implementations do not supp...
Hiroyuki Takizawa, Katsuto Sato, Kazuhiko Komatsu,...