Search Sciweavers | Sciweavers

482 search results - page 3 / 97

» A large-scale study of failures in high-performance computin...

115

click to vote

ICPPW
2008
IEEE

93views Distributed And Parallel Com...» more ICPPW 2008»

Simulating Failures on Large-Scale Systems

15 years 9 months ago

Download www.mcs.anl.gov

—Developing fault management mechanisms is a difﬁcult task because of the unpredictable nature of failures. In this paper, we present a fault simulation framework for Blue Gene...

Narayan Desai, Ewing L. Lusk, Daniel Buettner, And...

claim paper

Read More »

154

click to vote

CCGRID
2006
IEEE

130views Distributed And Parallel Com...» more CCGRID 2006»

A Failure-Aware Scheduling Strategy in Large-Scale Cluster System

15 years 9 months ago

Download www.ncic.ac.cn

As the scale is expanding, node failure becomes a commonplace feature of large-scale cluster systems. As an important part of cluster operating system software, job scheduling tak...

Linping Wu, Dan Meng, Jianfeng Zhan, Wang Lei, Bib...

claim paper

Read More »

137

Voted

JSSPP
2004
Springer

143views Distributed And Parallel Com...» more JSSPP 2004»

Performance Implications of Failures in Large-Scale Cluster Scheduling

15 years 8 months ago

Download www.ece.rutgers.edu

As we continue to evolve into large-scale parallel systems, many of them employing hundreds of computing engines to take on mission-critical roles, it is crucial to design those s...

Yanyong Zhang, Mark S. Squillante, Anand Sivasubra...

claim paper

Read More »

174

click to vote

DSN
2004
IEEE

148views Computer Networks» more DSN 2004»

Cluster-Based Failure Detection Service for Large-Scale Ad Hoc Wireless Network Applications

15 years 7 months ago

Download www.ia-tech.com

The growing interest in ad hoc wireless network applications that are made of large and dense populations of lightweight system resources calls for scalable approaches to fault to...

Ann T. Tai, Kam S. Tso, William H. Sanders

claim paper

Read More »

108

click to vote

SC
2000
ACM

109views Applied Computing» more SC 2000»

The Failure of TCP in High-Performance Computational Grids

15 years 7 months ago

Download www.sc2000.org

Distributed computational grids depend on TCP to ensure reliable end-to-end communication between nodes across the wide-area network (WAN). Unfortunately, TCP performance can be a...

Wu-chun Feng, Peerapol Tinnakornsrisuphap

claim paper

Read More »

« Prev « First page 3 / 97 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers