Sciweavers

SIGIR
2008
ACM

Local text reuse detection

13 years 11 months ago
Local text reuse detection
Text reuse occurs in many different types of documents and for many different reasons. One form of reuse, duplicate or near-duplicate documents, has been a focus of researchers because of its importance in Web search. Local text reuse occurs when sentences, facts or passages, rather than whole documents, are reused and modified. Detecting this type of reuse can be the basis of new tools for text analysis. In this paper, we introduce a new approach to detecting local text reuse and compare it to other approaches. This comparison involves a study of the amount and type of reuse that occurs in real documents, including TREC newswire and blog collections. Categories and Subject Descriptors H.3.1 [Content Analysis and Indexing]: Indexing methods General Terms Algorithms, Measurement, Experimentation Keywords Text reuse, fingerprinting, information flow
Jangwon Seo, W. Bruce Croft
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2008
Where SIGIR
Authors Jangwon Seo, W. Bruce Croft
Comments (0)