We recently proposed a new approach to parallelization, by decomposing the time domain, instead of the conventional space domain. This improves latency tolerance, and we demonstrat...
Some of the most challenging applications to parallelize scalably are the ones that present a relatively small amount of computation per iteration. Multiple interacting performanc...
Loop distribution is one of the most useful techniques to reduce the execution time of parallel applications. Traditionally, loop scheduling algorithms are implemented based on pa...
Abstract. We describe a scalable parallel implementation of the self organizing map (SOM) suitable for datamining applications involving clustering or segmentation against large da...
Richard D. Lawrence, George S. Almasi, Holly E. Ru...
Abstract. An efficient parallel algorithm is presented and tested for computing selected components of H−1 where H has the structure of a Hamiltonian matrix of two-dimensional la...
Lin Lin, Chao Yang, Jianfeng Lu, Lexing Ying, Wein...