This paper describes a fault detection mechanism that uses the error codes returned by the stream sockets to locate process failures. Since these errors are generated automaticall...
To improve the whole dependability of large-scale cluster systems, an online fault detection mechanism is proposed in this paper. This mechanism can detect the fault in time befor...