We propose a new fault tolerance metric for XOR-based erasure codes: the minimal erasures list (MEL). A minimal erasure is a set of erasures that leads to irrecoverable data loss and in which every erasure is necessary and sufficient for this to be so. The MEL is the enumeration of all minimal erasures. An XOR-based erasure code has an irregular structure that may permit it to tolerate faults at and beyond its Hamming distance. The MEL completely describes the fault tolerance of an XOR-based erasure code at and beyond its Hamming distance; it is therefore a useful metric for comparing the fault tolerance of such codes. We also propose an algorithm that efficiently determines the MEL of an erasure code. This algorithm uses the structure of the erasure code to efficiently determine the MEL. We show that, in practice, the number of minimal erasures for a given code is much less than the total number of sets of erasures that lead to data loss: in our empirical results for one corpus of co...
Jay J. Wylie, Ram Swaminathan