Sciweavers

ICDE
2009
IEEE

Differencing Provenance in Scientific Workflows

15 years 2 months ago
Differencing Provenance in Scientific Workflows
Abstract-- Scientific workflow management systems are increaingly providing the ability to manage and query the provenance of data products. However, the problem of differencing the provenance of two data products produced by executions of the same specification has not been adequately addressed. Although this problem is NP-hard for general workflow specifications, an analysis of real scientific (and business) workflows shows that their specifications can be captured as series-parallel graphs overlaid with well-nested forking and looping. For this natural restriction, we present efficient, polynomial-time algorithms for differencing executions of the same specification and thereby understanding the difference in the provenance of their data products. We then describe a prototype called PDiffView built around our differencing algorithm. Experimental results demonstrate the scalability of our approach using collected, real workflows and increasingly complex runs.
Zhuowei Bao, Sarah Cohen Boulakia, Susan B. Davids
Added 20 Oct 2009
Updated 20 Oct 2009
Type Conference
Year 2009
Where ICDE
Authors Zhuowei Bao, Sarah Cohen Boulakia, Susan B. Davidson, Anat Eyal, Sanjeev Khanna
Comments (0)