Sciweavers

ADBIS
2015
Springer

HBelt: Integrating an Incremental ETL Pipeline with a Big Data Store for Real-Time Analytics

8 years 6 months ago
HBelt: Integrating an Incremental ETL Pipeline with a Big Data Store for Real-Time Analytics
This paper demonstrates a system called HBelt which tightly integrates a distributed, key-value data store HBase with an extended ETL engine Kettle. The objective is to provide HBase tables with realtime data freshness in an efficient manner. A distributed ETL engine is extended and integrated as an overlay of HBase. Meanwhile, we extend this ETL engine with the capability of processing incremental ETL flows in a pipelined fashion. Delta batches are defined by the MVCC component in HBase to flush the incremental ETL pipeline for multiple concurrent read requests.Experimental results show that high query throughput can be achieved in HBelt for real-time analytics.
Weiping Qu, Sahana Shankar, Sandy Ganza, Stefan De
Added 13 Apr 2016
Updated 13 Apr 2016
Type Journal
Year 2015
Where ADBIS
Authors Weiping Qu, Sahana Shankar, Sandy Ganza, Stefan Dessloch
Comments (0)