Sciweavers

89 search results - page 15 / 18
» The overhead of consensus failure recovery
Sort
View
CCGRID
2010
IEEE
13 years 8 months ago
Team-Based Message Logging: Preliminary Results
Fault tolerance will be a fundamental imperative in the next decade as machines containing hundreds of thousands of cores will be installed at various locations. In this context, ...
Esteban Meneses, Celso L. Mendes, Laxmikant V. Kal...
ICDCS
2012
IEEE
11 years 10 months ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
MDM
2010
Springer
173views Communications» more  MDM 2010»
14 years 12 days ago
ParTAC: A Partition-Tolerant Atomic Commit Protocol for MANETs
—The support of distributed atomic transactions in mobile ad-hoc networks (MANET) is a key requirement for many mobile application scenarios. Atomicity is a fundamental property ...
Brahim Ayari, Abdelmajid Khelil, Neeraj Suri
RTAS
2009
IEEE
14 years 2 months ago
Adaptive Failover for Real-Time Middleware with Passive Replication
Supporting uninterrupted services for distributed soft real-time applications is hard in resource-constrained and dynamic environments, where processor or process failures and sys...
Jaiganesh Balasubramanian, Sumant Tambe, Chenyang ...
CBSE
2011
Springer
12 years 7 months ago
Rectifying orphan components using group-failover in distributed real-time and embedded systems
Orphan requests are a significant problem for multi-tier distributed systems since they adversely impact system correctness by violating the exactly-once semantics of application...
Sumant Tambe, Aniruddha S. Gokhale