Sciweavers

131 search results - page 9 / 27
» Fault-Tolerant Replication Management in Large-Scale Distrib...
Sort
View
IEEEHPCS
2010
13 years 5 months ago
Using replication and checkpointing for reliable task management in computational Grids
In grid computing systems, providing fault-tolerance is required for both scientific computation and file-sharing to increase their reliability. In previous works, several mechani...
Sangho Yi, Derrick Kondo, Bongjae Kim, Geunyoung P...
GPC
2007
Springer
14 years 1 months ago
CFR: A Peer-to-Peer Collaborative File Repository System
Abstract. Due to the high availability of the Internet, many large crossorganization collaboration projects, such as SourceForge, grid systems etc., have emerged. One of the fundam...
Meng-Ru Lin, Ssu-Hsuan Lu, Tsung-Hsuan Ho, Peter L...
IPPS
2006
IEEE
14 years 1 months ago
Lossless compression for large scale cluster logs
The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM’s Blue Gene/L which can acc...
R. Balakrishnan, Ramendra K. Sahoo
ICPADS
2002
IEEE
14 years 18 days ago
Sago: A Network Resource Management System for Real-Time Content Distribution
Abstract— Content replication and distribution is an effective technology to reduce the response time for web accesses and has been proven quite popular among large Internet cont...
Tzi-cker Chiueh, Kartik Gopalan, Anindya Neogi, Ch...
ICDCS
2012
IEEE
11 years 10 months ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...