Sciweavers

150 search results - page 23 / 30
» A neighborhood-based approach for clustering of linked docum...
Sort
View
CORR
2010
Springer
193views Education» more  CORR 2010»
13 years 6 months ago
A Probabilistic Approach for Learning Folksonomies from Structured Data
Learning structured representations has emerged as an important problem in many domains, including document and Web data mining, bioinformatics, and image analysis. One approach t...
Anon Plangprasopchok, Kristina Lerman, Lise Getoor
SDM
2003
SIAM
184views Data Mining» more  SDM 2003»
13 years 8 months ago
Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data
The problem of finding clusters in data is challenging when clusters are of widely differing sizes, densities and shapes, and when the data contains large amounts of noise and out...
Levent Ertöz, Michael Steinbach, Vipin Kumar
CIKM
2011
Springer
12 years 7 months ago
Probabilistic near-duplicate detection using simhash
This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
Sadhan Sood, Dmitri Loguinov
TREC
2007
13 years 8 months ago
Concordia University at the TREC 2007 QA Track
In this paper, we describe the system we used for the trec-2007 Question Answering Track. For factoid questions our redundancy-based approach using a modified version of aranea w...
Majid Razmara, Andrew Fee, Leila Kosseim
SAC
2005
ACM
14 years 28 days ago
A hierarchical naive Bayes mixture model for name disambiguation in author citations
Because of name variations, an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web ...
Hui Han, Wei Xu, Hongyuan Zha, C. Lee Giles