We investigate runtime strategies for data-intensive applications that involve generalized reductions on large, distributed datasets. Our set of strategies includes replicated fi...
We report on a model of the distribution of job submission interarrival times in supercomputers. Interarrival times are modeled as a consequence of a complicated set of decisions ...
Demand for programming environments to exploit clusters of symmetric multiprocessors (SMPs) is increasing. In this paper, we present a new programming environment, called ParADE, ...
We present a novel approach for decomposing contact/impact computations in which the mesh elements come in contact with each other during the course of the simulation. Effective d...
Buffered CoScheduled MPI (BCS-MPI) introduces a new approach to design the communication layer for largescale parallel machines. The emphasis of BCS-MPI is on the global coordinat...
This paper presents a case study of the 10-Gigabit Ethernet (10GbE) adapter from Intel R . Specifically, with appropriate optimizations to the configurations of the 10GbE adapte...
Wu-chun Feng, Justin Gus Hurwitz, Harvey B. Newman...
Oak Ridge National Laboratory installed a 32 processor Cray X1 in March, 2003, and will have a 256 processor system installed by October, 2003. In this paper we describe our initi...
Thomas H. Dunigan, Mark R. Fahey, James B. White I...
The emergence of grid and a new class of data-driven applications is making a new form of parallelism desirable, which we refer to as coarse-grained pipelined parallelism. This pa...