Search Sciweavers | Sciweavers

7271 search results - page 138 / 1455

» Fault-Tolerant Distributed Simulation

112

click to vote

ICPP
2007
IEEE

123views Distributed And Parallel Com...» more ICPP 2007»

A Meta-Learning Failure Predictor for Blue Gene/L Systems

15 years 8 months ago

Download www.mcs.anl.gov

The demand for more computational power in science and engineering has spurred the design and deployment of ever-growing cluster systems. Even though the individual components use...

Prashasta Gujrati, Yawei Li, Zhiling Lan, Rajeev T...

claim paper

Read More »

Voted

ICPP
2007
IEEE

89views Distributed And Parallel Com...» more ICPP 2007»

Fault-Driven Re-Scheduling For Improving System-level Fault Resilience

15 years 8 months ago

Download www.cs.iit.edu

The productivity of HPC system is determined not only by their performance, but also by their reliability. The conventional method to limit the impact of failures is checkpointing...

Yawei Li, Prashasta Gujrati, Zhiling Lan, Xian-He ...

claim paper

Read More »

127

click to vote

ICPP
2007
IEEE

139views Distributed And Parallel Com...» more ICPP 2007»

Mercury: Combining Performance with Dependability Using Self-virtualization

15 years 8 months ago

Download ppi.fudan.edu.cn

There has recently been increasing interests in using system virtualization to improve the dependability of HPC cluster systems. However, it is not cost-free and may come with som...

Haibo Chen, Rong Chen, Fengzhe Zhang, Binyu Zang, ...

claim paper

Read More »

Voted

IPPS
2007
IEEE

98views Distributed And Parallel Com...» more IPPS 2007»

RI2N/UDP: High bandwidth and fault-tolerant network for a PC-cluster based on multi-link Ethernet

15 years 8 months ago

Download www.cecs.uci.edu

PC-clusters with high performance/cost ratio have been one of the typical platforms for high performance computing. To lower costs, Gigabit Ethernet is often used for intercommuni...

Takayuki Okamoto, Shin'ichi Miura, Taisuke Boku, M...

claim paper

Read More »

110

Voted

SRDS
2007
IEEE

93views Operating System» more SRDS 2007»

The Fail-Heterogeneous Architectural Model

15 years 8 months ago

Download www.deeds.informatik.tu-darmstadt.de

Fault tolerant distributed protocols typically utilize a homogeneous fault model, either fail-crash or fail-Byzantine, where all processors are assumed to fail in the same manner....

Marco Serafini, Neeraj Suri

claim paper

Read More »

« Prev « First page 138 / 1455 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers