We consider the coverage testing problem where we are given a document and a corpus with a limited query interface and asked to find if the corpus contains a near-duplicate of th...
Ali Dasdan, Paolo D'Alberto, Santanu Kolay, Chris ...
While scalable data mining methods are expected to cope with massive Web data, coping with evolving trends in noisy data in a continuous fashion, and without any unnecessary stopp...
It is becoming increasingly common to construct databases from information automatically culled from many heterogeneous sources. For example, a research publication database can b...
Aron Culotta, Michael L. Wick, Robert Hall, Matthe...
Abstract. This paper describes an approach to detect hints on procurement fraud. It was developed within the context of a European Union project on fraud prevention. Procurement fr...
Social (or folksonomic) tagging has become a very popular way to describe content within Web 2.0 websites. However, as tags are informally defined, continually changing, and ungo...
Giovanni Quattrone, Licia Capra, Pasquale De Meo, ...