Modern embedded systems often require high degrees of instruction-level parallelism (ILP) within strict constraints on power consumption and chip cost. Unfortunately, a high-perfo...
This paper presents a new algorithm for computing the singular value decomposition (SVD) on multilevel memory hierarchy architectures. This algorithm is based on one-sided JRS iter...
Mostafa I. Soliman, Sanguthevar Rajasekaran, Reda ...
Computational Grids potentially offer low cost, readily available, and large-scale high-performance platforms. For the parallel execution of programs, however, computational GRIDs ...
Abdallah Al Zain, Philip W. Trinder, Greg Michaels...
This paper describes a high performance sampling architecture for inference of latent topic models on a cluster of workstations. Our system is faster than previous work by over an...
Future high-performance billion-transistor processors are likely to employ partitioned architectures to achieve high clock speeds, high parallelism, low design complexity, and low...