Sciweavers

103 search results - page 10 / 21
» Models and Algorithms for Duplicate Document Detection
Sort
View
DIS
2007
Springer
14 years 1 months ago
Unsupervised Spam Detection Based on String Alienness Measures
We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (...
Kazuyuki Narisawa, Hideo Bannai, Kohei Hatano, Mas...
SIGIR
2006
ACM
14 years 1 months ago
Feature diversity in cluster ensembles for robust document clustering
The performance of document clustering systems depends on employing optimal text representations, which are not only difficult to determine beforehand, but also may vary from one ...
Xavier Sevillano, Germán Cobo, Francesc Al&...
ICDM
2007
IEEE
147views Data Mining» more  ICDM 2007»
13 years 11 months ago
Improving Knowledge Discovery in Document Collections through Combining Text Retrieval and Link Analysis Techniques
In this paper, we present Concept Chain Queries (CCQ), a special case of text mining in document collections focusing on detecting links between two topics across text documents. ...
Wei Jin, Rohini K. Srihari, Hung Hay Ho, Xin Wu
ICAPR
2001
Springer
13 years 12 months ago
Character Extraction from Interfering Background - Analysis of Double-Sided Handwritten Archival Documents
The sipping of ink through the pages of certain double-sided handwritten documents after long periods of storage poses a serious problem to human readers or OCR systems. This pape...
Chew Lim Tan, Ruini Cao, Qian Wang, Peiyi Shen
KDD
2007
ACM
148views Data Mining» more  KDD 2007»
14 years 7 months ago
Detecting research topics via the correlation between graphs and texts
In this paper we address the problem of detecting topics in large-scale linked document collections. Recently, topic detection has become a very active area of research due to its...
Yookyung Jo, Carl Lagoze, C. Lee Giles