The performance of both serial and parallel implementations of matrix multiplication is highly sensitive to memory system behavior. False sharing and cache conflicts cause traditi...
Siddhartha Chatterjee, Alvin R. Lebeck, Praveen K....
In this paper, we propose and implement a vector processing system that includes two identical vector microprocessors embedded in two FPGA chips. Each vector microprocessor suppor...
Hongyan Yang, Shuai Wang, Sotirios G. Ziavras, Jie...
The design of high performance multimedia systems in a short time force us to use IP's blocks in many designs. However, their correct integration in a design implies more com...
Background: Horizontal gene transfer (HGT) has allowed bacteria to evolve many new capabilities. Because transferred genes perform many medically important functions, such as conf...
Future SoCs will contain multiple cores. For workloads with significant parallelism, prior work has shown the benefit of many small, multi-threaded, scalar cores. For workloads th...