We investigate the problem of evaluating the performance of text processing algorithms on inputs that contain errors as a result of optical character recognition. A new hierarchic...
In this paper, we propose a new comprehensive methodology in order to evaluate the performance of noisy historical document recognition techniques. We aim to evaluate not only the...
Information-extraction (IE) research typically focuses on clean-text inputs. However, an IE engine serving real applications yields many false alarms due to less-well-formed input...
Radu Florian, John F. Pitrelli, Salim Roukos, Imed...
Abstract. This article describes an automatic evaluation procedure for NLP system robustness under the strain of noisy and ill-formed input. The procedure requires no manual work o...
We present a critique of language-based modelling for text input research, and propose an alternative inputbased approach. Current language-based statistical models are derived fr...