In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostl...
The research described in this paper is concerned with the application of information retrieval to software maintenance, and in particular to the problem of recovering traceabilit...
Abstract. Nowadays, multimedia documents composed of text and images are increasingly used, thanks to the Internet and the increasing capacity of data storage. It is more and more ...
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
As the Web provides rich data embedded in the immense contents inside pages, we witness many ad-hoc efforts for exploiting fine granularity information across Web text, such as We...