Abstract. This paper presents a fast object class localization framework implemented on a data parallel architecture currently available in recent computers. Our case study, the im...
A novel parallel algorithm for matrix multiplication is presented. It is based on a 1-D hyper-systolic processor abstraction. The procedure can be implemented on all types of para...
Graphics processing units (GPUs) provide both memory bandwidth and arithmetic performance far greater than that available on CPUs but, because of their Single-Instruction-Multiple...
Parallel implementations of programming languages need to control synchronization overheads. Synchronization is essential for ensuring the correctness of parallel code, yet it add...
—To meet the higher data rate requirement of emerging wireless communication technology, numerous parallel turbo decoder architectures have been developed. However, the interleav...
Guohui Wang, Yang Sun, Joseph R. Cavallaro, Yuanbi...