Abstract. In this paper we introduce a general framework for hierarchical clustering that deals with both static and dynamic data sets. From this framework, different hierarchical...
Abstract. The Web has been rapidly “deepened” with the prevalence of databases online. On this “deep Web,” numerous sources are structured, providing schema-rich data– Th...
Record linkage is an important data integration task that has many practical uses for matching, merging and duplicate removal in large and diverse databases. However, a quadratic ...
Timothy de Vries, Hui Ke, Sanjay Chawla, Peter Chr...
In this paper, we address a unique problem in Chinese language processing and report on our study on extending a Chinese thesaurus with region-specific words, mostly from the fina...
Ambiguity of entity mentions and concept references is a challenge to mining text beyond surface-level keywords. We describe an effective method of disambiguating surface forms an...
Yiping Zhou, Lan Nie, Omid Rouhani-Kalleh, Flavian...