Cluster systems have been gradually more popular and are being broadly used in a variety of applications. On the other hand, many of those systems are not tolerant to system failu...
By studying trouble tickets from small enterprise networks, we conclude that their operators need detailed fault diagnosis. That is, the diagnostic system should be able to diagno...
Srikanth Kandula, Ratul Mahajan, Patrick Verkaik, ...
This paper presents an experimental evaluation of the fault-tolerant communication (FTCOM) layer of the DECOS integrated architecture. The FTCOM layer implements different agreemen...
Jonny Vinter, Henrik Eriksson, Astrit Ademaj, Bern...
Designing applications with timeliness requirements in environments of uncertain synchrony is known to be a difficult problem. In this paper, we follow the perspective of timing ...
Management of large-scale parallel and distributed applications is an extremely complex task due to factors such as centralized management architectures, lack of coordination and ...