This paper describes a language-independent, scalable system for both challenges of crossdocument co-reference: name variation and entity disambiguation. We provide system results...
Skew estimation and page segmentation are the two closely related processing stages for document image analysis. Skew estimation needs proper page segmentation, especially for doc...
The next wave in search technology will be driven by the identification, extraction, and exploitation of real-world entities represented in unstructured textual sources. Search sy...
Abstract. Newistic is a web mining platform that collects and analyses documents crawled from the Internet. Although it currently processes news articles, it can be easily adapted ...
The case-based reasoning approach to email response consists of reusing past messages to synthesize new responses to incoming requests. This task presents various challenges due to...