In distributed-memory message-passing architectures reducing communication cost is extremely important. In this paper, we present a technique to optimize communication globally. O...
Mahmut T. Kandemir, Prithviraj Banerjee, Alok N. C...
Wild populations of organism are often difficult to study in their natural settings. Often, it is possible to infer mating information about these species by genotyping the offspri...
Saad I. Sheikh, Ashfaq A. Khokhar, Tanya Y. Berger...
Tiling has long been used to improve cache performance. Recursion has recently been used as a cache-oblivious method of improving cache performance. Both of these techniques are n...
Joon-Sang Park, Michael Penner, Viktor K. Prasanna
A metascalable (or “design once, scale on new architectures”) parallel computing framework has been developed for large spatiotemporal-scale atomistic simulations of materials...
Ken-ichi Nomura, Richard Seymour, Weiqiang Wang, H...
Abstract. Domain decomposition for regular meshes on parallel computers has traditionally been performed by attempting to exactly partition the work among the available processors ...