Data parallelism in bioinformatics workflows using Hydra

14 years 1 months ago

Download salsahpc.indiana.edu

Large scale bioinformatics experiments are usually composed by a set of data flows generated by a chain of activities (programs or services) that may be modeled as scientific workflows. Current Scientific Workflow Management Systems (SWfMS) are used to orchestrate these workflows to control and monitor the whole execution. It is very common in bioinformatics experiments to process very large datasets. In this way, data parallelism is a common approach used to increase performance and reduce overall execution time. However, most of current SWfMS still lack on supporting parallel executions in high performance computing (HPC) environments. Additionally keeping track of provenance data in distributed environments is still an open, yet important problem. Recently, Hydra middleware was proposed to bridge the gap between the SWfMS and the HPC environment, by providing a transparent way for scientists to parallelize workflow executions while capturing distributed provenance. This paper analy...

Fábio Coutinho, Eduardo S. Ogasawara, Danie

Real-time Traffic

Data Parallelism | Distributed And Parallel Computing | HPDC 2010 | Scientific Workflows | Workflow |

claim paper

Post Info
More Details (n/a)

Added	09 Nov 2010
Updated	09 Nov 2010
Type	Conference
Year	2010
Where	HPDC
Authors	Fábio Coutinho, Eduardo S. Ogasawara, Daniel de Oliveira, Vanessa P. Braganholo, Alexandre A. B. Lima, Alberto M. R. Dávila, Marta Mattoso

Comments (0)

Sciweavers

Data parallelism in bioinformatics workflows using Hydra

Data Parallelism | Distributed And Parallel Computing | HPDC 2010 | Scientific Workflows | Workflow |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers