Sciweavers

KDD
2009
ACM

Applying syntactic similarity algorithms for enterprise information management

14 years 11 months ago
Applying syntactic similarity algorithms for enterprise information management
: ? Applying Syntactic Similarity Algorithms for Enterprise Information Management Ludmila Cherkasova, Kave Eshghi, Charles B. Morrey III, Joseph Tucek, Alistair Veitch HP Laboratories HPL-2009-90 syntActic similarity, enterprise information management, performance modeling, shingling algorithms, content-based chunking algorithms. For implementing content management solutions and enabling new applications associated with data retention, regulatory compliance, and litigation issues, enterprises need to develop advanced analytics to uncover relationships among the documents, e.g., content similarity, provenance, and clustering. In this paper, we evaluate the performance of four syntactic similarity algorithms. Three algorithms are based on Broder's "shingling" technique while the fourth algorithm employs a more recent approach, "content-based chunking". For our experiments, we use a specially designed corpus of documents that includes a set of "similar"...
Ludmila Cherkasova, Kave Eshghi, Charles B. Morrey
Added 25 Nov 2009
Updated 25 Nov 2009
Type Conference
Year 2009
Where KDD
Authors Ludmila Cherkasova, Kave Eshghi, Charles B. Morrey, Joseph Tucek, Alistair C. Veitch
Comments (0)