Abstract. Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), b...
Elkin Garcia, Ioannis E. Venetis, Rishi Khan, Guan...
This paper presents a mesh with virtual buses as the bandwidth-efficient implementation of the mesh with multiple broadcasting on which many computational problems can be solved w...
Jong Hyuk Choi, Bong Wan Kim, Kyu Ho Park, Kwang-I...
— Scientific applications often perform complex computational analyses that consume and produce large data sets. We are concerned with data placement policies that distribute dat...
Ann L. Chervenak, Ewa Deelman, Miron Livny, Mei-Hu...
As the technology for high-speed networks has evolved over the last decade, the interconnection of commodity computers (e.g., PCs and workstations) at gigabit rates has become a re...
Mark Baker, Paul A. Farrell, Hong Ong, Stephen L. ...
In modern computer systems loops present a great deal of opportunities for increasing Instruction Level and Thread Level Parallelism. Loop unrolling is a technique used to obtain ...