– Better understanding the document logical components is crucial to many applications, e.g., document classification or data integration. As the development of digital libraries...
Information extraction (IE) addresses the problem of extracting specific information from a collection of documents. Much of the previous work on IE from structured documents, suc...
Raymond Kosala, Hendrik Blockeel, Maurice Bruynoog...
MATLAB provides a powerful environment for rapid prototyping of research methods and techniques. Across the wide range of on-line pen computing applications there exists a series ...
We present a document routing and index partitioning scheme for scalable similarity-based search of documents in a large corpus. We consider the case when similarity-based search ...
Document classification is a key task for many text mining applications. However, traditional text classification requires labeled data to construct reliable and accurate classifie...