Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems

14 years 9 months ago

Download users.sdsc.edu

MapReduce provides a parallel and scalable programming model for data-intensive business and scientific applications. MapReduce and its de facto open source project, called Hadoop, support parallel processing on large datasets with capabilities including automatic data partitioning and distribution, load balancing, and fault tolerance management. Meanwhile, scientific workflow management systems, e.g., Kepler, Taverna, Triana, and Pegasus, have demonstrated their ability to help domain scientists solve scientific problems by synthesizing different data and computing resources. By integrating Hadoop with Kepler, we provide an easy-to-use architecture that facilitates users to compose and execute MapReduce applications in Kepler scientific workflows. Our implementation demonstrates that many characteristics of scientific workflow management systems, e.g., graphical user interface and component reuse and sharing, are very complementary to those of MapReduce. Using the presented Hadoop co...

Jianwu Wang, Daniel Crawl, Ilkay Altintas

Real-time Traffic

Applied Computing | SC 2009 | Scientific Workflow | Scientific Workflow Management | Workflow Management Systems |

claim paper

Post Info
More Details (n/a)

Added	19 May 2010
Updated	19 May 2010
Type	Conference
Year	2009
Where	SC
Authors	Jianwu Wang, Daniel Crawl, Ilkay Altintas

Comments (0)

Sciweavers

Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems

Applied Computing | SC 2009 | Scientific Workflow | Scientific Workflow Management | Workflow Management Systems |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers