Sciweavers

342 search results - page 16 / 69
» A planning based approach to failure recovery in distributed...
Sort
View
SRDS
2008
IEEE
14 years 1 months ago
Probabilistic Failure Detection for Efficient Distributed Storage Maintenance
Distributed storage systems often use data replication to mask failures and guarantee high data availability. Node failures can be transient or permanent. While the system must ge...
Jing Tian, Zhi Yang, Wei Chen, Ben Y. Zhao, Yafei ...
CLUSTER
2004
IEEE
13 years 7 months ago
MPI/FT: A Model-Based Approach to Low-Overhead Fault Tolerant Message-Passing Middleware
Fault tolerance in parallel systems has traditionally been achieved through a combination of redundancy and checkpointing methods. This notion has also been extended to message-pas...
Rajanikanth Batchu, Yoginder S. Dandass, Anthony S...
HASE
1997
IEEE
13 years 11 months ago
High-Coverage Fault Tolerance in Real-Time Systems Based on Point-to-Point Communication
: The distributed recovery block (DRB) scheme is a widely applicable approach for realizing both hardware and software fault tolerance in real-time distributed and parallel compute...
K. H. Kim, Chittur Subbaraman, Eltefaat Shokri
ICDCS
2002
IEEE
14 years 15 days ago
A Practical Approach for ?Zero? Downtime in an Operational Information System
An Operational Information System (OIS) supports a real-time view of an organization’s information critical to its logistical business operations. A central component of an OIS ...
Ada Gavrilovska, Karsten Schwan, Van Oleson
EUROMICRO
2009
IEEE
13 years 11 months ago
Fault-Tolerant BPEL Workflow Execution via Cloud-Aware Recovery Policies
BPEL is the de facto standard for business process modeling in today's enterprises and is a promising candidate for the integration of business and scientific applications tha...
Ernst Juhnke, Tim Dörnemann, Bernd Freisleben