Practitioners and researchers often refer to error rates or accuracy percentages of databases. The former is the number of cells in error divided by the total number of cells; the latter is the number of correct cells divided by the total number of cells. However, databases may have similar error rates (or accuracy percentages) but differ drastically in the severity of their accuracy problems. A simple percent does not provide information as to whether the errors are systematic such as one record with 20 fields in error or 20 errors randomly distributed throughout the database. The difference is rooted in the degree of randomness or complexity. We expand the accuracy metric to include a complexity (randomness) measure and include a probability distribution value. The proposed randomness check is based on the Lempel-Ziv (LZ) complexity measure. The main candidate for the probability distribution parameter is Poisson’s lambda. The newly described metric allows management to distinguis...
Craig W. Fisher, Eitel J. M. Lauría, Caroly