A central problem in information retrieval is the automated classification of text documents. While many existing methods achieve good levels of performance, they generally require...
Pen-based computing has not yet taken off, partly because of the lack of fast and easy text input methods. The situation is even worse for people using East Asian languages, where...
ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...
We describe a model for the lexical analysis of Arabic text, using the lists of alternatives supplied by a broad-coverage morphological analyzer, SAMA, which include stable lemma ...
Rushin Shah, Paramveer S. Dhillon, Mark Liberman, ...
The ridge logistic regression has successfully been used in text categorization problems and it has been shown to reach the same performance as the Support Vector Machine but with...