In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostl...
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
Astronomy increasingly faces the issue of massive datasets. For instance, the Sloan Digital Sky Survey (SDSS) has so far generated tens of millions of images of distant galaxies, ...
Brigham Anderson, Andrew W. Moore, Andrew Connolly...
We consider the problem of improving named entity recognition (NER) systems by using external dictionaries--more specifically, the problem of extending state-of-the-art NER system...
The problem of similarity search (query-by-content) has attracted much research interest. It is a difficult problem because of the inherently high dimensionality of the data. The ...