As grid computation systems become larger and more complex, manually diagnosing failures in jobs becomes impractical. Recently, machine-learning techniques have been proposed to d...
Key issues to address in autonomic job recovery for cluster computing are recognizing job failure; understanding the failure sufficiently to know if and how to restart the job; an...
Charles Earl, Emilio Remolina, Jim Ong, John Brown
Large scale production computing grids introduce new challenges in debugging and troubleshooting. A user that submits a workload consisting of tens of thousands of jobs to a grid ...
Development of advanced anomaly detection and failure diagnosis technologies for spacecraft is a quite significant issue in the space industry, because the space environment is ha...
Process monitoring refers to the task of detecting abnormal process operations resulting from the shift in the mean and/or the variance of one or more process variables. To success...