Efficient lineage tracking for scientific workflows

16 years 6 months ago

Download aeolus.ceid.upatras.gr

Data lineage and data provenance are key to the management of scientific data. Not knowing the exact provenance and processing pipeline used to produce a derived data set often renders the data set useless from a scientific point of view. On the positive side, capturing provenance information is facilitated by the widespread use of workflow tools for processing scientific data. The workflow process describes all the steps involved in producing a given data set and, hence, captures its lineage. On the negative side, efficiently storing and querying workflow based data lineage is not trivial. All existing solutions use recursive queries and even recursive tables to represent the workflows. Such solutions do not scale and are rather inefficient. In this paper we propose an alternative approach to storing lineage information captured as a workflow process. We use a space and query efficient interval representation for dependency graphs and show how to transform arbitrary workflow processe...

Thomas Heinis, Gustavo Alonso

Real-time Traffic

Data Lineage | Data Provenance | Database | SIGMOD 2008 | Workflow Based Data |

claim paper

» Theoretical enzyme design using the Kepler scientific workflows on the Grid

» Ecosystems Monitoring An Information Extraction and Event Processing Scientific Workflow

Post Info
More Details (n/a)

Added	08 Dec 2009
Updated	08 Dec 2009
Type	Conference
Year	2008
Where	SIGMOD
Authors	Thomas Heinis, Gustavo Alonso

Comments (0)

Sciweavers

Efficient lineage tracking for scientific workflows

Data Lineage | Data Provenance | Database | SIGMOD 2008 | Workflow Based Data |

Explore & Download

Productivity Tools

Sciweavers