French government has launch in 2000 a public debate about conservation of data and electronic documents. Due to the widespread use of Internet and extranet technologies, especial...
This position paper presents an algorithm, which determines similarities between text documents. These text documents are indexed with keywords and further background knowledge-ter...
This paper introduces a framework for clarifying and formalizing the duplicate document detection problem. Four distinct models are presented, each with a corresponding algorithm ...
abstract of invited paper Document management has many aspects, among them acquisition, storage, retrieval, presentation and processing of documents (work flow). These aspects will...
Document representations can rapidly become unwieldy if they try to encapsulate all possible document properties, ranging tract structure to detailed rendering and layout. We pres...