: A major problem that arises from integrating different databases is the existence of duplicates. Data cleaning is the process for identifying two or more records within the datab...
In this paper we propose a new operator for advanced exploration of large multidimensional databases. The proposed operator can automatically generalize from a specific problem c...
The problem of estimating progress for long-running queries has recently been introduced. We analyze the characteristics of the progress estimation problem, from the perspective o...
The transversal hypergraph enumeration based algorithms can be efficient in mining frequent itemsets, however it is difficult to apply them to sequence mining problems. In this ...
Dong (Haoyuan) Li, Anne Laurent, Maguelonne Teisse...
We consider the problem of speeding up Entity Recognition systems that exploit existing large databases of structured entities to improve extraction accuracy. These systems requir...