This paper presents a new approach to text processing, based on textemes. These are atomic text units generalising the concepts of character and glyph by merging them in a common ...
We describe a model for the lexical analysis of Arabic text, using the lists of alternatives supplied by a broad-coverage morphological analyzer, SAMA, which include stable lemma ...
Rushin Shah, Paramveer S. Dhillon, Mark Liberman, ...
In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostl...