Sciweavers

181 search results - page 18 / 37
» Membrane: Operating System Support for Restartable File Syst...
Sort
View
USENIX
2008
13 years 9 months ago
FlexVol: Flexible, Efficient File Volume Virtualization in WAFL
zation is a well-known method of abstracting physical resources and of separating the manipulation and use of logical resources from their underlying implementation. We have used ...
John K. Edwards, Daniel Ellard, Craig Everhart, Ro...
ICDCS
2008
IEEE
14 years 2 months ago
stdchk: A Checkpoint Storage System for Desktop Grid Computing
— Checkpointing is an indispensable technique to provide fault tolerance for long-running high-throughput applications like those running on desktop grids. This paper argues that...
Samer Al-Kiswany, Matei Ripeanu, Sudharshan S. Vaz...
HOTOS
2009
IEEE
13 years 11 months ago
On Availability of Intermediate Data in Cloud Computations
This paper takes a renewed look at the problem of managing intermediate data that is generated during dataflow computations (e.g., MapReduce, Pig, Dryad, etc.) within clouds. We d...
Steven Y. Ko, Imranul Hoque, Brian Cho, Indranil G...
CCGRID
2011
IEEE
12 years 11 months ago
High Performance Pipelined Process Migration with RDMA
—Coordinated Checkpoint/Restart (C/R) is a widely deployed strategy to achieve fault-tolerance. However, C/R by itself is not capable enough to meet the demands of upcoming exasc...
Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavi...
SIGMETRICS
2011
ACM
245views Hardware» more  SIGMETRICS 2011»
12 years 10 months ago
Record and transplay: partial checkpointing for replay debugging across heterogeneous systems
Software bugs that occur in production are often difficult to reproduce in the lab due to subtle differences in the application environment and nondeterminism. To address this pr...
Dinesh Subhraveti, Jason Nieh