A framework is presented for discovering partial duplicates in large collections of scanned books with optical character recognition (OCR) errors. Each book in the collection is r...
Structured outputs such as multidimensional vectors or graphs are frequently encountered in real world pattern recognition applications such as computer vision, natural language pr...
Recent work on ontology-based Information Extraction (IE) has tried to make use of knowledge from the target ontology in order to improve semantic annotation results. However, ver...
This paper describes an XPath-based discourse analysis module for Spoken Dialogue Systems that allows the dialogue author to easily manipulate and query both the user input's...
Document classification presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of the natural language. The tradi...