An important technical hurdle blocking the adoption of mobile agent technology is the lack of reliability. Designing a reliable mobile agent system is especially challenging since a mobile agent is potentially a ected by failure of any host that it visits, or failure of any communication link that it needs to traverse. Previous work in this domain has attempted techniques such as periodic checkpointing of mobile agent state and restarting upon machine or communication recovery. Such approaches render an agent unavailable until a machine or a communication link itself recovers. In this paper, we take an alternate approach based on the premise that a mobile agent can often complete its task in more than one way. We capture such redundancy in non-deterministic constructs in the agent language and maintain state about an agent's actual computational path in its possible computational tree. We design and implement a distributed recovery scheme that detects a failure, rolls back an age...