Stalls, due to mis-matches in communication rates, are a major performance obstacle in pipelined circuits. If the rate of data production is faster than the rate of consumption, t...
Abstract— Floating-point arithmetic is notoriously nonassociative due to the limited precision representation which demands intermediate values be rounded to fit in the availabl...
When integrating software threads together to boost performance on a processor with instruction-level parallel processing support, it is rarely clear which code regions should be ...
In this paper we present a high-performance, high throughput, and area efficient architecture for the VLSI implementation of the AES algorithm. The subkeys, required for each round...
Naga M. Kosaraju, Murali R. Varanasi, Saraju P. Mo...
We present an architecture and an implementation of an FPGA-based sparse matrix-vector multiplier (SMVM) for use in the iterative solution of large, sparse systems of equations ar...
Yousef El-Kurdi, Warren J. Gross, Dennis Giannacop...