This paper describes a text normalization system for deletion-based abbreviations in informal text. We propose using statistical classifiers to learn the probability of deleting ...
In recent years, new semistatic word-based byte-oriented text compressors, such as Tagged Huffman and those based on Dense Codes, have shown that it is possible to perform fast d...
While participating in the HARD track our first question was, what an IR-application should look like that takes into account preference meta-data from the user, without the need ...
Abstract. The number of features to be considered in a text classification system is given by the size of the vocabulary and this is normally in the range of the tens or hundreds o...
David Vilar, Hermann Ney, Alfons Juan, Enrique Vid...
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...