Search Sciweavers | Sciweavers

482 search results - page 25 / 97

» A large-scale study of failures in high-performance computin...

168

click to vote

HPDC
2010
IEEE

203views Distributed And Parallel Com...» more HPDC 2010»

ROARS: a scalable repository for data intensive scientific computing

15 years 4 months ago

Download cse.nd.edu

As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide b...

Hoang Bui, Peter Bui, Patrick J. Flynn, Douglas Th...

claim paper

Read More »

125

Voted

CCGRID
2009
IEEE

116views Distributed And Parallel Com...» more CCGRID 2009»

Performance under Failures of DAG-based Parallel Computing

15 years 10 months ago

Download www.cs.iit.edu

— As the scale and complexity of parallel systems continue to grow, failures become more and more an inevitable fact for solving large-scale applications. In this research, we pr...

Hui Jin, Xian-He Sun, Ziming Zheng, Zhiling Lan, B...

claim paper

Read More »

153

click to vote

ICPP
2009
IEEE

185views Distributed And Parallel Com...» more ICPP 2009»

Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems

15 years 10 months ago

Download nowlab.cse.ohio-state.edu

—Clusters and applications continue to grow in size while their mean time between failure (MTBF) is getting smaller. Checkpoint/Restart is becoming increasingly important for lar...

Xiangyong Ouyang, Karthik Gopalakrishnan, Dhabales...

claim paper

Read More »

114

click to vote

PDP
2002
IEEE

93views Distributed And Parallel Com...» more PDP 2002»

On the Impossibility of Implementing Perpetual Failure Detectors in Partially Synchronous Systems

15 years 8 months ago

Download gsyc.escet.urjc.es

In this paper we study the implementability of different classes of failure detectors in several models of partial synchrony. We show that no failure detector with perpetual accur...

Mikel Larrea, Antonio Fernández, Sergio Ar&...

claim paper

Read More »

134

click to vote

CCGRID
2009
IEEE

191views Distributed And Parallel Com...» more CCGRID 2009»

Failure-Aware Construction and Reconfiguration of Distributed Virtual Machines for High Availability Computing

15 years 7 months ago

Download www.cse.unt.edu

In large-scale clusters and computational grids, component failures become norms instead of exceptions. Failure occurrence as well as its impact on system performance and operatio...

Song Fu

claim paper

Read More »

« Prev « First page 25 / 97 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers