Abstract This vandalism detector uses features primarily derived from a wordpreserving differencing of the text for each Wikipedia article from before and after the edit, along wit...
In this paper, we call the pattern classification problem that consists in assigning a category label to a long audio signal based on its semantic content as Generic Audio Documen...
We present a system that classifies pixels in a document image according to marking type such as machine print, handwriting, and noise. A segmenter module first splits an input ...
In this paper, we report the development and experiments of IBM Content Harvester (CH), a tool to analyze and recover templates and content from word processor created text docume...
The tagging problem in natural language processing is to find a way to label every word in a text as a particular part of speech, e.g., proper noun. An effective way of solving th...