Subspace clustering techniques were proposed to discover hidden clusters that only exist in certain subsets of the full feature spaces. However, the time complexity of such algorit...
In earlier work we have extended the TPC-C benchmark with basic and complex schema transformations. This paper uses this benchmark to investigate the blocking behaviour of online s...
This paper presents and evaluates an alternative sorting component for Hadoop based on the replacement selection algorithm. In comparison with the default quicksort-based implement...
Web archiving is the process of collecting and preserving web content in an archive for current and future generations. One of the key issues in web archiving is that not all websi...
This paper demonstrates a system called HBelt which tightly integrates a distributed, key-value data store HBase with an extended ETL engine Kettle. The objective is to provide HBa...
Weiping Qu, Sahana Shankar, Sandy Ganza, Stefan De...
The well-known problems of tuning and self-tuning of data management systems are amplified in the context of Cloud environments that promise self management along with properties ...
When dealing with large amounts of data, exact query answering is not always feasible. We propose a query approximation method that, given an upper bound on the amount of data that...