The High Level Architecture (HLA) is a standard for the interoperability and reuse of simulation components, referred to as federates. Large scale HLA-compliant simulations are built to study complex problems, and they often involve a large number of federates and vast computing resources. Simulation federates running at different locations are liable to failure. The failure of one federate can lead to the crash of the overall simulation execution. Such risk increases with the scale of a distributed simulation. Hence, fault-tolerance is required to support runtime robustness. This paper introduces a framework for robust HLAbased distributed simulations using a “Decoupled Federate Architecture”. Our framework exploits the architecture to provide a generic fault-tolerant model, that exploits a “dynamic substitution” approach to deal with failure. A sender-based method is designed to ensure reliable in-transit message delivery, which is coupled with a novel algorithm to perform e...