CLUSTER 2004 | Sciweavers

155

CLUSTER
2004
IEEE

140views Distributed And Parallel Com...» more CLUSTER 2004»

FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI

15 years 10 months ago

As high performance clusters continue to grow in size, the mean time between failure shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challengi...

Gengbin Zheng, Lixia Shi, Laxmikant V. Kalé

claim paper

Read More »

170

click to vote

CLUSTER
2004
IEEE

117views Distributed And Parallel Com...» more CLUSTER 2004»

Fault-tolerant grid services using primary-backup: feasibility and performance

15 years 10 months ago

Download cseweb.ucsd.edu

The combination of Grid technology and web services has produced an attractive platform for deploying distributed applications: Grid services, as represented by the Open Grid Serv...

Xianan Zhang, Dmitrii Zagorodnov, Matti A. Hiltune...

claim paper

Read More »

171

click to vote

CLUSTER
2004
IEEE

128views Distributed And Parallel Com...» more CLUSTER 2004»

Scalable, high-performance NIC-based all-to-all broadcast over Myrinet/GM

15 years 10 months ago

Download nowlab.cse.ohio-state.edu

All-to-all broadcast is one of the common collective operations that involve dense communication between all processes in a parallel program. Previously, programmable Network Inte...

Weikuan Yu, Dhabaleswar K. Panda, Darius Buntinas

claim paper

Read More »

156

click to vote

CLUSTER
2004
IEEE

109views Distributed And Parallel Com...» more CLUSTER 2004»

An efficient end-host architecture for cluster communication

15 years 10 months ago

Download cs-www.bu.edu

Cluster computing environments built from commodity hardware have provided a cost-effective solution for many scientific and high-performance applications. Likewise, middleware te...

Xin Qi, Gabriel Parmer, Richard West

claim paper

Read More »

151

click to vote

CLUSTER
2004
IEEE

81views Distributed And Parallel Com...» more CLUSTER 2004»

NIC-based offload of dynamic user-defined modules for Myrinet clusters

15 years 10 months ago

Download nowlab.cse.ohio-state.edu

Many of the modern networks used to interconnect nodes in cluster-based computing systems provide network interface cards (NICs) that offer programmable processors. Substantial re...

Adam Wagner, Hyun-Wook Jin, Dhabaleswar K. Panda, ...

claim paper

Read More »

185

click to vote

CLUSTER
2004
IEEE

155views Distributed And Parallel Com...» more CLUSTER 2004»

Communicating efficiently on cluster based grids with MPICH-VMI

15 years 10 months ago

Download vmi.ncsa.uiuc.edu

Emerging infrastructure of computational grids composed of Clusters-of-Clusters (CoC) interlinked through high throughput channels promises unprecedented raw compute power for ter...

Avneesh Pant, Hassan Jafri

claim paper

Read More »

162

click to vote

CLUSTER
2004
IEEE

182views Distributed And Parallel Com...» more CLUSTER 2004»

NWPerf: a system wide performance monitoring tool for large Linux clusters

15 years 10 months ago

Download www.ornl.gov

Ryan W. Mooney, Ken P. Schmidt, R. Scott Studham

claim paper

Read More »

163

click to vote

CLUSTER
2004
IEEE

123views Distributed And Parallel Com...» more CLUSTER 2004»

An evaluation of the close-to-files processor and data co-allocation policy in multiclusters

15 years 10 months ago

Download www.pds.ewi.tudelft.nl

In multicluster systems, and more generally, in grids, jobs may require co-allocation, i.e., the simultaneous allocation of resources such as processors and input files in multipl...

Hashim H. Mohamed, Dick H. J. Epema

claim paper

Read More »

165

click to vote

CLUSTER
2004
IEEE

122views Distributed And Parallel Com...» more CLUSTER 2004»

A comparison of local and gang scheduling on a Beowulf cluster

15 years 10 months ago

Download cs.anu.edu.au

Gang Scheduling and related techniques are widely believed to be necessary for efficientjob scheduling on distributed memory parallel computers. This is hecause they minimize cont...

Peter E. Strazdins, John Uhlmann

claim paper

Read More »

151

click to vote

CLUSTER
2004
IEEE

103views Distributed And Parallel Com...» more CLUSTER 2004»

Towards informatic analysis of Syslogs

15 years 10 months ago

Download www.cs.sandia.gov

The complexity and cost of isolating the root cause of system problems in large parallel computers generally scales with the size of the system. Syslog messages provide a primary ...

John Stearley

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers