Sciweavers

ICAIL
2007
ACM

Essential deduplication functions for transactional databases in law firms

14 years 4 months ago
Essential deduplication functions for transactional databases in law firms
As massive document repositories and knowledge management systems continue to expand, in proprietary environments as well as on the Web, the need for duplicate detection becomes increasingly important. In business enterprises such as law firms, effective retrieval applications depend upon such functionality. Today's Internet-savvy users are not interested in search results containing numerous sets of duplicate documents, whether exact duplicates or near variants. This report addresses our work in the domain of legal information retrieval, working with a large, transactional knowledge management system. We specifically explore the occurrence and treatment of identical, near-identical, and fuzzy duplicate sub-documents (`clauses') in a contracts database. To our knowledge, we are the first to use principled methods to construct a test collection of transactional documents for such research purposes, one which identifies a variety of duplicate types and is deployed to establish...
Jack G. Conrad, Edward L. Raymond
Added 16 Aug 2010
Updated 16 Aug 2010
Type Conference
Year 2007
Where ICAIL
Authors Jack G. Conrad, Edward L. Raymond
Comments (0)