Abstract— In this work, web-based metrics for semantic similarity computation between words or terms are presented and compared with the state-of-the-art. Starting from the funda...
In this paper, we report the development and experiments of IBM Content Harvester (CH), a tool to analyze and recover templates and content from word processor created text docume...
In this paper we propose a model of creation and use of documentation based on the concept of mixed-initiative interaction. In our model, successful single-initiative interaction ...
The difficulty with information retrieval for OCR documents lies in the fact that OCR documents comprise of a significant amount of erroneous words and unfortunately most informat...
Despite the increase in email and other forms of digital communication, the use of printed documents continues to increase every year. Many types of printed documents need to be &...
Aravind K. Mikkilineni, Gazi N. Ali, Pei-Ju Chiang...