We show that relational algebra calculations for incomplete databases, probabilistic databases, bag semantics and whyprovenance are particular cases of the same general algorithms involving semirings. This further suggests a comprehensive provenance representation that uses semirings of polynomials. We extend these considerations to datalog and semirings of formal power series. We give algorithms for datalog provenance calculation as well as datalog evaluation for incomplete and probabilistic databases. Finally, we show that for some semirings containment of conjunctive queries is the same as for standard set semantics. Categories and Subject Descriptors H.2.1 [Database Management]: Data Models General Terms Theory, Algorithms Keywords Data provenance, data lineage, incomplete databases, probabilistic databases, semirings, datalog, formal power series
Todd J. Green, Gregory Karvounarakis, Val Tannen