Executing parallel applications across distributed networks introduces the problem of fault tolerance. A viable solution for fault tolerance must keep overhead manageable and not c...
This paper studies the automation of fault management of wireless and mobile networks. A network management protocol similar to SNMP with an integrated architecture using mobile ag...
Performance analysis of high performance systems is a difficult task. Current tools have proven successful in analysis tasks but their implementation is limited in several respects...
Metacomputing infrastructures couple multiple clusters (or MPPs) via wide-area networks. A major problem in programming parallel applications for such platforms is their hierarchi...
Detecting races is important for debugging shared-memory parallel programs, because the races result in unintended nondeterministic executions of the programs. Previous on-the- y t...
Abstract. While standard processors achieve supercomputer performance, a performance gap exists between the interconnect of MPP's and COTS. Standard solutions like Ethernet ca...