We had introduced the massively parallel global cellular automata (GCA) model. Parallel algorithms derived from applications can be mapped straight forward onto this model. In thi...
We describe the implementation of an out-of-core, distribution-based sorting program on a cluster using FG, a multithreaded programming framework. FG mitigates latency from disk-I/...
Priya Natarajan, Thomas H. Cormen, Elena Riccio St...
Recently, graphics processing units (GPUs) are providing increasingly higher performance with programmable internal processors, namely vertex processors (VPs) and fragment process...
The Collaborative Computing Transport Layer (CCTL) is a communication substrate consisting of a suite of multiparty protocols, providing varying service qualities among process gr...
Injong Rhee, Shun Yan Cheung, Phillip W. Hutto, Va...
This paper describes a toolset, PACE, that provides detailed predictive performance information throughout the implementation and execution stages of an application. It is structur...
Darren J. Kerbyson, John S. Harper, Efstathios Pap...