Learning structured representations has emerged as an important problem in many domains, including document and Web data mining, bioinformatics, and image analysis. One approach t...
Anon Plangprasopchok, Kristina Lerman, Lise Getoor
The problem of finding clusters in data is challenging when clusters are of widely differing sizes, densities and shapes, and when the data contains large amounts of noise and out...
This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
In this paper, we describe the system we used for the trec-2007 Question Answering Track. For factoid questions our redundancy-based approach using a modified version of aranea w...
Because of name variations, an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web ...