This paper introduces a framework for clarifying and formalizing the duplicate document detection problem. Four distinct models are presented, each with a corresponding algorithm ...
Existing distributed hash tables provide efficient mechanisms for storing and retrieving a data item based on an exact key, but are unsuitable when the search key is similar, but ...
Many document-based applications, including popular Web browsers, email viewers, and word processors, have a ‘Find on this Page’ feature that allows a user to find every occur...
Kevyn Collins-Thompson, Charles Schweizer, Susan T...
This paper presents a simple and efficient algorithm for approximate dictionary matching designed for similarity measures such as cosine, Dice, Jaccard, and overlap coefficients. ...
We describe a model for strings of characters that is loosely based on the Lempel Ziv model with the addition that a repeated substring can be an approximate match to the original...