A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

14 years 10 months ago

Download www.biomedcentral.com

Background: Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or ‘workflow’, is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results: To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python (’PaPy’). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either l...

Marcin Cieslik, Cameron Mura

Real-time Traffic

Artificial Intelligence | Bioinformatics | BMCBI 2011 | PaPy | Workflows |

claim paper

Post Info
More Details (n/a)

Added	12 May 2011
Updated	12 May 2011
Type	Journal
Year	2011
Where	BMCBI
Authors	Marcin Cieslik, Cameron Mura

Comments (0)

Sciweavers

A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

Artificial Intelligence | Bioinformatics | BMCBI 2011 | PaPy | Workflows |

Explore & Download

Productivity Tools

Sciweavers