While clusters of commodity servers and switches are the most popular form of large-scale parallel computers, many programs are not easily parallelized for execution upon them. In...
Hanjun Kim, Arun Raman, Feng Liu, Jae W. Lee, Davi...
—Much of dense linear algebra has been successfully blocked to concentrate the majority of its time in the Level 3 BLAS, which are not only efficient for serial computation, but...
Existing dynamic race detectors suffer from at least one of the following three limitations: (i) space overhead per memory location grows linearly with the number of parallel thre...
We present an innovative design of an accurate, 2D DCT IDCT processor, which handles scaled and sub-sampled input blocks efficiently. In the IDCT mode, the latency of the processo...
Rohini Krishnan, Om Prakash Gangwal, Jos T. J. van...