Sciweavers

901 search results - page 161 / 181
» Hiding Communication Latency in Data Parallel Applications
Sort
View
TVLSI
2010
13 years 2 months ago
C-Pack: A High-Performance Microprocessor Cache Compression Algorithm
Microprocessor designers have been torn between tight constraints on the amount of on-chip cache memory and the high latency of off-chip memory, such as dynamic random access memor...
Xi Chen, Lei Yang, Robert P. Dick, Li Shang, Haris...
ASAP
2003
IEEE
124views Hardware» more  ASAP 2003»
14 years 1 months ago
Arbitrary Bit Permutations in One or Two Cycles
Symmetric-key block ciphers encrypt data, providing data confidentiality over the public Internet. For inter-operability reasons, it is desirable to support a variety of symmetric...
Zhijie Shi, Xiao Yang, Ruby B. Lee
ISCA
2000
IEEE
156views Hardware» more  ISCA 2000»
14 years 4 days ago
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit
Reconfigurable hardware has the potential for significant performance improvements by providing support for applicationāˆ’specific operations. We report our experience with Chimae...
Zhi Alex Ye, Andreas Moshovos, Scott Hauck, Prithv...
PPOPP
2009
ACM
14 years 8 months ago
Comparability graph coloring for optimizing utilization of stream register files in stream processors
A stream processor executes an application that has been decomposed into a sequence of kernels that operate on streams of data elements. During the execution of a kernel, all stre...
Xuejun Yang, Li Wang, Jingling Xue, Yu Deng, Ying ...
ICDCS
2009
IEEE
14 years 2 months ago
Explicit Batching for Distributed Objects
Although distributed object systems, for example RMI and CORBA, enable object-oriented programs to be easily distributed across a network, achieving acceptable performance usually...
Eli Tilevich, William R. Cook, Yang Jiao